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Abstract 

Bayesian partially identified models have received a growing attention in recent 
years in the econometric literature, due to their broad applications in empirical studies. 
Classical Bayesian approach in this literature has been assuming a parametric model, 
by specifying an ad-hoc parametric likelihood function. However, econometric models 
usually only identify a set of moment inequalities, and therefore assuming a known 
likelihood function suffers from the risk of misspecification, and may result in incon- 
sistent estimations of the identified set. On the other hand, moment-condition based 
C^"- ■ likelihoods such as the limited information and exponential tilted empirical likelihood, 

though guarantee the consistency, lack of probabilistic interpretations. We propose a 
£f~^ [ semi-parametric Bayesian partially identified model, by placing a nonparametric prior 

on the unknown likelihood function. Our approach thus only requires a set of moment 
conditions but still possesses a pure Bayesian interpretation. We study the posterior 
of the support function, which is essential when the object of interest is the identified 
set. The support function also enables us to construct two-sided Bayesian credible sets 
(BCS) for the identified set. It is found that, while the BCS of the partially identified 
^ | parameter is too narrow from the frequentist point of view, that of the identified set 

has asymptotically correct coverage probability in the frequentist sense. Moreover, we 
establish the posterior consistency for both the structural parameter and its identified 
set. We also develop the posterior concentration theory for the support function, and 
prove the semi-parametric Bernstein von Mises theorem. Finally, the proposed method 
is applied to analyze a financial asset pricing problem. 
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1 Introduction 



1.1 Bayesian inference for partially identified models 

Partially identified models have been receiving extensive attentions in recent years, due 
to their broad applications in econometrics. Partial identification of a structural parameter 
arises when the data available and the constraints coming from economic theory only allow 
to place the parameter inside a proper subset of the parameter space. Due to the limitation 
of the data generating process, the data cannot provide any information within the set where 
the structural parameter is partially identified (called identified set). 

This paper aims at developing Bayesian inference for partially identified models. A 
Bayesian approach may be appealing for several reasons. First, while frequentist approaches 
cannot tell anything inside the identified set, the Bayesian approach can. When informa- 
tive (subjective) priors are available for the structural parameter, the shape of the posterior 
density may be not flat even inside the identified set, providing more information about the 
parameter that cannot be told by the data. When no a priori information is available, using 
a uniform prior helps us estimate the true identified set. The Bayesian analysis for partially 
identified models produces a posterior distribution whose support will asymptotically con- 
centrate around the true identified set. Therefore, the asymptotic behavior for the posterior 
distribution is different from that of the traditional point identified case, the latter being usu- 
ally (asymptotically) normally distributed due to the Bernstein-von Mises theorem. Hence, 
the information from the prior is washed away by the data when the structural parameter 
is identifiable. A second appealing feature of Bayesian methods arises in situations where 
we are interested only in a projection of the identified region, that is, a subset of the struc- 
tural parameter. It turns out that projecting a high dimensional identified region to a low 
dimensional space using a Bayesian approach is easier than with frequentist approaches be- 
cause this simply requires the marginalization of a joint distribution. Third, when the model 
incorporates a multidimensional parameter with some components that are identified and 
some others that are not, we can learn from the data something also about the non-identified 
parameters through the information brought by the identified parameters. Moreover, when 
(asymptotic) equivalence between Bayesian credible sets (BCS) and frequentist confidence 
sets (FCS) is established, we can take advantage of the fact that BCS are often easier to 
construct than FCS thanks to the use of Markov Chain Monte Carlo (MCMC) methods. 
Finally, sometimes frequentist inference relies heavily on point identification, and in some 
cases achieving the point identification requires stringent assumptions that are hard to ver- 
ify. In contrast, a Bayesian procedure nevertheless makes inference based on the posterior, 
whose construction does not require point identification. We illustrate this point further in 
the following two examples. 

Example 1.1 (Functional of nonparametric instrumental regression). In a nonparametric 
IV regression model E(y\W) = E(g(X)\W) with instrument W (e.g., Hall and Horowitz 
2005, Florens and Simoni 2012), suppose we are interested in a functional h(g) of g. The 
current literature makes inference about h(g) assuming its point identification. However, the 
identification of h(g) relies on a stringent assumption that is hard to verify (see e.g., Severini 
and Tripathi 2006 and Santos 2012). Using a Bayesian partial identification approach, in 
contrast, we can put a prior on 6 = h(g) directly without requiring point identification. The 
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deduced posterior of 6 nevertheless enables statistical inference. In particular, if point identi- 
fication is indeed guaranteed, it can be inferred from the shape of the posterior distribution. 
□ 

Example 1.2 (Quantile regression with endogenous censoring). In the model y = X T 9 + e 
and Med(e|X) = 0.5, only (I(y < c), min(y, c), X) is observed for some censoring variable c. 
In particular, the censoring may arbitrarily depend on y and thus is endogenous. Though a 
sufficient condition for the point identification of 9 has been given in the literature (e.g., Khan 
and Tamer 2009), it is stringent and also hard to verify. In contrast, a Bayesian procedure 
nevertheless imposes a prior on 9 and makes inference via the posterior distribution. On the 
other hand, the posterior can help to check if point identification is indeed guaranteed. □ 



There are in general two Bayesian approaches for partially identified models currently 
developed in the literature. The first one is based on a parametric likelihood function, 
which is assumed to be known by econometricians up to a finite dimensional parameter. 
This approach has been used frequently in the literature, see e.g., Moon and Schorfheide 
(2012), Poirier (1998), Gustafson (2012), Bollinger and Hasselt (2009), Norets and Tang 
(2012) among many others. However, econometric models usually only identify a set of 
moment inequalities instead of the full likelihood function. Examples are: interval-censored 
data, interval instrumental regression, asset pricing (Chernozhukov et al. 2008), incomplete 
structural models (Menzel 2011) etc. Assuming a parametric form of the likelihood function 
is therefore ad-hoc. Once the likelihood is mis-specified, the posterior can be misleading. The 
second approach starts from a set of moment inequalities, and uses a moment-condition-based 
likelihood such as the limited information likelihood (Kim 2002) and the exponential tilted 
empirical likelihood (Schennach 2005). Further references may be found in Liao and Jiang 

(2010) , Kitagawa (2012) and Wan (2011). This approach avoids assuming the knowledge of 
the true likelihood function. However, it does not have a probabilistic interpretation. The 
use of moment-condition-based likelihoods makes this approach quasi-Bayesian, which uses 
the Bayesian machinery for inference, see Chernozhukov and Hong (2003). 

We propose a pure Bayesian procedure without assuming a parametric form of the true 
likelihood function. Instead, we place nonparametric priors on the likelihood and obtain the 
marginal posterior distribution for the partially identified parameter as well as the posterior 
for the identified set. A similar Bayesian procedure was recently used in Florens and Simoni 

(2011) . As a result, our procedure is semi-parametric Bayesian that involves both finite and 
infinite dimensional parameters. Our approach thus only requires a set of moment conditions 
but still possesses a probabilistic interpretation. 

Let 9 denote the partially identified structural parameter. In addition, we assume that 
there is a finite dimensional nuisance parameter that is point identified by the data gen- 
erating process and that characterizes the identified set. In general, there are two ways in 
the literature to specify a prior on 9. In the moment-condition-based model, as Kim (2002) 
and Liao and Jiang (2010), the prior 7r(9) is placed marginally, and does not need to take 
into account the partial identification. Hence 7r (9) can be supported on the entire parameter 
space. In contrast, in the likelihood-based model as considered by Moon and Schorfheide 

(2012) and Gustafson (2012), the prior k(9\4>) is placed conditionally on 0, and needs to 
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incorporate the partial identification structure by assuming it is supported only on the iden- 
tified set, the latter being parametrized by 0. We further illustrate this difference in a simple 
example of interval censoring. 

Example 1.3 (Interval censored data). Suppose Y e R is censored between Y\ and Y 2 . 
We are interested in the structural parameter 9 = EY, but only Y\ and Y 2 are observable. 
Let = (0i, 2 )' = (E(Y 1 ),E(Y 2 ))', then 9 is partially identified on 6(0) = [0i,0 2 ]. The 
moment-condition-based approach starts from a moment inequality model E(Y± — 9) < 
and E{9 — Y 2 ) < 0, and places a prior n(9) that is supported on the entire parameter space. 
In contrast, the likelihood-based approach places a prior 7r(#|0) that is only supported on 
[0i5 02]- Therefore, the latter's prior specification takes into account the fact that Y is 
censored in [Y^Y^], while the first approach does not need so. □ 

In this paper we specify a conditional prior 7r(0|0) for 9 given which incorporates 
the partial identification structure as in the likelihood-based approach. Examples of such 
priors include the uniform prior, truncated normal prior, and many priors that have bounded 
support. Then, the unknown likelihood function is defined for only, where is a point- 
identified nuisance parameter. 

We provide a frequentist validation of our procedure. This means that we admit the 
existence of a true value of the structural parameter and the identified set, and prove that 
the posterior distribution concentrates asymptotically in a neighborhood of this true value. 
This property is known as posterior consistency and is important because it guarantees that, 
with a sufficiently large amount of data, we can almost surely recover the truth accurately. 
Lack of consistency is particularly undesirable and a Bayesian procedure should not be used 
if the corresponding posterior is inconsistent. 

1.2 Highlights of our contributions 

We highlight three distinguished features of our approach, which also illustrate our main 
contributions. 



Semi-parametric Bayesian partial identification 

We endow the point identified nuisance parameter with a prior 7r(0). The true like- 
lihood function Z„(0) is defined on the support of 0. Without assuming any parametric 
form for /„(•), we place a nonparametric prior 7r(/ n ) on the space of probability density (or 
distribution) functions l n . The prior specification is completed by a conditional prior 7r(0|0) 
which takes into account the partial identification structure. Therefore, the model contains 
finite dimensional parameters (9, 0) and an infinite dimensional parameter /„, where (0, /„) 
are point-identified nuisance parameters. The marginal posterior of 9 is then given by 

p(9\Data) oc J n{9\<fi)ii{(f))l n {(f))ix{l n )d(j)dl n . 

Such a semi-parametric posterior requires only a set of moment inequalities, and therefore 
is a robust (and pure) Bayesian procedure. 
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In partially identified models, inference may be carried out both for the structural pa- 
rameter 9 and for the identified set. The prior specification 7r(0|0) on 9 plays a role only for 
inference on 9. 

We propose two types of priors on the point-identified parameter (<f>,l n ). The first con- 
sists of a nonparametric prior on the distribution function of the data generating process, 
with the Dirichlet process prior as an important example. Using this prior, the prior 7r(0) 
of the parameter can be recovered by viewing as a function of the distribution func- 
tion. This prior is appealing when we have no prior information for 0. On the contrary, 
if there is informative prior information for 0, it is more convenient to place an alternative 
semi-parametric prior specified as the product of a prior on and a prior on the underlying 
likelihood function l n . This type of prior on l n is usually specified on the space of proba- 
bility density functions, and includes the finite mixture of normals and Dirichlet mixture of 
normals as examples. 

For these prior schemes, we show that asymptotically p{9\Data) will be supported within 
an arbitrarily small neighborhood of the true identified set, which is the notion of pos- 
terior consistency under partial identification. Moreover, we construct the posterior for 

the identified set, and show that asymptotically it concentrates within a y^jp Hausdorff- 
neighborhood around the true identified set. 

Support function 

Our setup is similar to that of Moon and Schorfheide (2012) in that the identified set 
is completely determined by the identified nuisance parameter 0, and hence can be written 
as 0(0). Once the posterior of is determined, so is that of 0(0). For a definition of the 
prior and posterior of 0(0) we refer to Florens and Simoni (2011) who define them in terms 
of capacity functionals. To make inference on 0(0) we can take advantage of the fact that 
when 0(0) is closed and convex it is completely characterized by its support function S^(-) 
defined as: 

S^ip) = sup 9 T p 
eee(4>) 

where p G § dim W, the unit sphere. Therefore, inference on 0(0) may be conveniently carried 
out through inference on its support function. The posterior distribution of S</,(-) is also 
determined by that of 0. We show that in a general moment inequality model, the support 
function has an asymptotic linear representation in a neighborhood of the true value for 

0. The posterior of S ,/,(•) is shown to asymptotically concentrate within a ^/^jp sup-norm- 
neighborhood around the support function of the true identified set. Moreover, we prove the 
Bernstein-von Mises theorem, that is, the posterior distribution of the support function is 
shown to be asymptotically normal. We also calculate the support function for a number of 
interesting examples, including interval censored data, missing data, interval instrumental 
regression and asset pricing model. 
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Two-sided Bayesian credible sets for the identified set 

We construct two types of Bayesian credible sets (BCS), one for the partially identified 
parameter 9 and the other for the identified set 0(0). In particular, the BCS for the identified 
set is constructed based on the support function and is two-sided, that is, we find sets 
O(0m) _9t ^ and O(0m) _<?t//v/ ™, where 0m is any consistent estimator of ( e.g. the posterior 
mode of 0, see Section 6 for definitions) such that with probability one, P(0(0m) _9t ^ v/ ™ C 
9(0) C 9(0 M ) 9T/v/ "|-Data) = 1 - r for credible level 1 - r. It is found that the two-sided 
BCS for the identified set have asymptotically correct coverage probability, in the sense that 

PD„(e(0 M r W ^ C 0(0 O ) C 6(0 M )^) > 1 - r + o p (l) 

where P Dn denotes the sampling probability. Therefore, 6(0m)~ <?t ^ v/ ™ and O(0m)~ 9t,/ ^™ ; can 
also be used as frequentist confidence sets for the identified set. The notation of 0(0m) _9t/ ' v/ ", 
O(0m) WV ™ and q T are to be formally defined in Section 6. On the other side, we find that 
also in the semi-parametric Bayesian model, Moon and Schorfheide (2012)'s conclusion about 
the BCS for the partially identified parameter 9 still holds. Indeed, the BCS for the partially 
identified parameter tends to be smaller than frequentist confidence sets in large samples. 

Note that we consider a fixed data generating process (DGP). The constructed BCS has 
asymptotically correct coverage probability for any specific DGP, and the uniformity issue as 
in Andrews and Soares (2010) is not considered. In addition, all the results on the identified 
set, support function and posterior consistency for 9 are valid even when point identification 
is actually achieved, that is, when 0(0) is a singleton. 

1.3 Literature review 

There is a growing literature on Bayesian partially identified models. Besides those 
mentioned above, the list also includes Gelfand and Sahu (1999), Neath and Samaniego 
(1997), Epstein and Seo (2011), Stoye (2012), Kline (2011), etc. There is also an extensive 
literature from a frequentist point of view. A partial list includes Andrews and Guggenberger 

(2009) , Andrews and Soares (2010), Beresteanu, Molchanov and Molinari (2010), Bugni 

(2010) , Canay (2010), Chernozhukov, Hong and Tamer (2007), Chiburis (2009), Imbens and 
Manski (2004), Romano and Shaikh (2010), Rosen (2008), Stoye (2010), among others. See 
Tamer (2010) for a review. 

When the identified set is closed and convex, the support function becomes one of the 
useful tools to characterize its properties. Therefore the support function has been recently 
introduced to study partially identified models, and the literature on this perspective has 
been growing rapidly, see for example, Bontemps, Magnac and Maurin (2012), Beresteanu 
and Molinari (2008), Beresteanu et al. (2012), Kaido and Santos (2012), Kaido (2012) and 
Chandrasekhar et al. (2012). 

1.4 Organization 

The paper is organized as follows. Section 2 sets up the model and proposes two types 
of prior specification on the underlying likelihood function. Section 3 achieves the posterior 
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consistency for the (marginal) posterior distribution of the structural parameter. Section 4 
derives the posterior consistency for the identified set and provide the concentration rate. 
Section 5 studies the posterior of the support function in moment inequality models. In 
particular, the Bernstein von Mises theorem for the support function is proved. Section 6 
constructs the Bayesian credible sets for both the structural parameter and its identified set 
and looks in detail at the missing data example. Section [7] discusses the case when point 
identification is actually achieved. In this case, all the derived results on the identified set 
and the support function are still valid. Section [S] applies the support function approach to 
a financial asset pricing study. Finally, Section [9] concludes with further discussions. All the 
proofs are given in the appendix. 



2 General Setup of Bayesian Partially Identified Model 

2.1 The Model 

Econometric models often involve a finite dimensional structural parameter 9. In many 
cases such a structural parameter is only partially identified by the data generating process 
on a non-singleton set, which we call identified set. The goal of an econometrician is to make 
inference on the partially identified parameter as well as the identified set based on the data. 

Along with 9, the model also includes a finite dimensional nuisance parameter G $ that 
is point identified by the data generating process. Here $ denotes the parameter space for 0. 
The point identified parameter often arises naturally as it characterizes the data distribution. 
In most of partially identified models, the identified set is also characterized by 0, hence we 
denote it by 0(0) to indicate that once is determined, so is the identified set. Let 9 denote 
the parameter space for 9; we assume 0(0) C 0. 

We put a prior on (9, 0), which induces a prior on the identified set 0(0) via 0. Due to 
the identification feature, for any given G the conditional prior 7r(0|0) is specified such 
that 

n(9 G 0(0)|0) = 1. 

Our analysis focuses on the situation where 0(0) is a closed and convex set for each 0. 
Therefore 0(0) can be uniquely characterized by its support function. Let d = dim(0). For 
any fixed 0, the support function for 0(0) is a function S ,/,(•) : E> d — > K. such that 

S+(p) = sup 9 T p. 
eee(<t>) 

where E> d denotes the unit sphere in M d . The support function plays a central role in convex 
analysis since it determines all the characteristics of a convex set. For example, if 9 G 0(0), 
then its kth. component has bounds 9k G [— e^), S^e*,)], where is the kth standard 
basis vector (a vector of all zeros, except for a one in the kth position). Also, 9 G 0(0) if 
and only if p T 9 < S</,(p) for all p G E> d . 

Characterization and frequentist e stimation of the identifie d set through its suppor t func- 



tion has been pre viously proposed bvlBontemps et al. I (120111 ) and iBeresteanu et al. I (120121 ) 



and also used by iKaido and Santos I (120111 ) among others. It is also one of the essential 



8 



objects for our Bayesian inference. At the best of our knowledge a Bayesian estimation of 
the support function of the identified set has not been proposed in the literature so far. 

Similar to 0(0), we put a prior on S^(-) via the prior on 0. In this paper we investigate 
the asymptotic frequentist properties of the posterior distribution of the support function, as 
well as those of 9 and of 6 (0), including the posterio r concentration rates and the Bernstein 
von Mises theorem as in iBickel and Kleiin I (120121 ). In addition, we carry out Bayesian 



inferences by constructing two-sided Bayesian credible sets for the identified set 0(0) based 
on the support function. 

Before formalizing our Bayesian setup, let us present a few examples that have received 
much attention in partially identified econometric models literature. 

Example 2.1 (Interval censored data - continued). Let (Y,Yi,Y 2 ) be a 3-dimensional ran- 
dom vector such that Y G [YJ. , Y 2 ] with probability 1 . The random variables Y\ and Y 2 
are observed while Y is unobservable (see, e.g., Moon and Schorfheide 2012). We denote: 
6 = E{Y) and = (0i,0 2 )' := (E(Yi), E{Y 2 ))' . Therefore, we have the following identified 
set for 9: 0(0) = [0i,02]. The support function for 0(0) is easy to derive: 



SV(1) = 02, SV(-l) = -0i 



□ 



Example 2.2 (Interval regression model). The regression model with interval censoring has 
been studied by, for example, Haile and Tamer (2003), etc. Let (Y, Y\, Y 2 ) be a 3-dimensional 
random vector such that Y G [Y% , Y 2 ] with probability 1 . The random variables Yi and Y 2 
are observed while Y is unobservable. Assume that 

Y = X T 6 + e 

where A is a vector of observable regressors. In addition, assume there is a d- dimensional 
vector of nonnegative exogenous variables Z such that E(Ze) = 0. Here Z can be either a 
vector of instrumental variables when A is endogenous, or a nonnegative transformation of 
A when A is exogenous. It follows that 

E{ZY 1 ) < E(ZY) = E(ZX T )6 < E(ZY 2 ). (2.1) 

We denote = (0i,0 2 ,0 3 ) with (0f , <j%) = {E{ZY 1 ) 1 \E(ZY 2 ) T ) and 2 = E(ZX T ). Then 
the identified set for 6 is given by 0(0) = {0 G : 0i < 4> 2 6 < 3 }. Suppose 2 1 exists. The 
support function for 0(0) is given by (for sgn(x) = I{x > 0) - I(x < 0)0 

G f \ T, -1^01 + 03^. t/^03-0i\ _ S d 

<W)=P02 ( 2 ) +a p{ 2 J' P 

where a p = ((p T (j) 2 1 ) l sgn(p T (j)^ 1 ) 1 , (p T (j)^ 1 ) d sgn(p T 2 " 1 ) (i ) T . 
□ 



^ee Appendix |X] for detailed derivations of the support function in this example. Similar results but in 
a slightly different form are presented in Bontemps et al. (2012). 
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Example 2.3 (missing data). Consider a bivariate random vector (Y,M) where M is a 
binary random variable which takes the value M = 1 when Y is missing and otherwise. 
The parameter of interest is the marginal distribution Fy of Y: 9 = Fy(y). This problem 
without the missing-at-random assumption has been extensively studied in the literature, 
see for example, Manski and Tamer (2002), Manski (2003), etc. Let F and Fm denote 
respectively the joint distribution of (Y, M) and marginal distribution of M. Moreover, 
Fy\M denotes the conditional distribution of Y given M. By the Law of Total Probability: 



6 = F YlM (y\M = 0)F M (M = 0) + F Y \ M (y\M = 1)F M {M = 1). Since F Y \ M (y\M = 1) 



cannot be recovered from the data then the empirical evidence partially identifies 9 and 6 is 
characterized by the following moment restrictions: 



Here = (F(y, M = 0),F M (M = 1)) = (0i,0 2 ). The identified set is 6(0) = [0i,0i + 2 ]. 
and it support function is: S</,(1) = (pi + 02, ^^(—1) = — 0i. □ 

2.2 Semi-parametric Bayesian Setup 

Let F denote the distribution function for the observed data, which is point identified 
by the data generating process. In a parametric Bayesian partially identified model as in 
Poirier (1998), Gustafson (2005, 2012) and Moon and Schorfheide (2012), F is linked with 
a known likelihood function for 0. Therefore the model is parametric and one does not put 
priors on F. For example, in the interval censored data Example I2.1[ if we know that Y\ 
and Y 2 are jointly normal with mean (0i,02) and a known covariance matrix S. Then the 
likelihood function is given by 



where {{Yu, Y^)}^ is a set of i.i.d. realizations of (Yi, Y%). Then F is the cdf of a bivariate 
normal distribution with mean vector and covariance S. The standard Bayesian approach 
for a partially identified model (parametric) proceeds by specifying a joint prior distribution 
tt(8, 0) and obtains the marginal posterior for 9: 



However, like for usual point identified models, assuming a known likelihood function may 
suffer from a model specification problem, and may lead to very misleading results. Instead, 
econometric applications often involve only a set of moment conditions as (12. ip . This gives 
rise to the so-called moment inequality models, e.g., Chernozhukov, Hong and Tamer (2007), 
Bugni (2010), Liao and Jiang (2010), Andrews and Soares (2010), Kaido and Santos (2011), 
and many other references therein. A parametric form of the likelihood function and of F is 
unavailable in these models, and ad-hoc assumptions that make the model parametric can 
result to severe misleading conclusions. 



F(y, M = 0) < 6 < F(y, M = 0) + F M (M = 1). 



/(0) = (27rdet(S))-" /2 



exp -- J2(Y U - 0i, Y 2t - 2 )£- 1 (r l4 - 0i, Y 2l - 2 ) T 
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A much more robust approach is to proceed without assuming a parametric form for the 
likelihood function, but put a prior on F instead. This yields to the semi-parametric Bayesian 
setup. The statistical model therefore contains three parameters: the structural parameter 
of interest 9, a finite dimensional nuisance parameter which can be point identified by the 
DGP, and a nuisance infinite dimensional parameter F which characterizes the distribution 
of the data. 

A further distinction among the parameters may be done on the basis of identification: the 
identified parameters ((f), F), which characterize the sampling distribution of the observable 
random variables, and the partially identified parameter 9, which is linked to the sampling 
distribution through 0. We have to take this difference into account when we construct the 
prior distribution for the model parameters. Therefore, the prior distribution is naturally 
decomposed into a marginal prior for the identified parameter and a conditional prior for 9 
given the identified parameter such that 

u{9 e e(0)|0) = 1. 

We specify a conditional prior distribution for 9 given taking the form 

7r(0|0) oc I ee e(<p)g(9) 

where g(-) is some probability density function with respect to the Lebesgue measure and 
he&(<t>) is the indicator function of 6(0) which takes the value 1 if 9 e 6(0). By construction 
this prior puts all its mass on 6(0), V0 G $. 

Below we describe two possible ways to specify the prior on (0, F): a fully nonpara- 
metric prior and a semi-parametric prior. The first prior scheme consists in placing a fully 
nonparametric prior on F which induces a prior on through a transformation = <f){F). 
When there is more informative prior information for directly, it is more convenient to 
place a prior on (0, rj) where rj is an infinity- dimensional nuisance parameter (often a density 
function) that is independent of a priori and that characterizes F . The prior on (0, F) is 
then deduced from the prior on (0, rj). 

Below we formally define these two prior specifications. An illustrative example is given 
in Section 2.3. Let X denote the observable random variable for which we have n i.i.d. 
observations D n = {Xj}™ =1 . Let (X,*& X ,F) denote a probability space in which X takes 
values. Let F denote the set of probability measures on (X, QSx), which is also the parameter 
space of F. 

2.2.1 Nonparametric prior 

Since is point identified, we assume it can be rewritten as a measurable function of F 
as = 0(F), for instance = E(X) = f xF(dx). A possible way to construct the prior 
distribution consists of specifying a nonparametric prior distribution for F and then deduce 
from it the prior distribution for via 0(F). The Bayesian experiment is 

X|F~F, F~tt(F), 0|0 = 0(F) ~ tt(0|0(F)) 
The prior distribution 7r(F) is a distribution on F . Examples of such a prior include 
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Dirichlet process priors ([Ferguson I ( 119731 )) and Polya tree (jLavine I (119921 )). The case 
wh ere tt(F) is a Dirich l et pro cess prior in partially identified models has been proposed 
by iFlorens and Simoni I (120111 ). 

Conditionally on 0, the data are completely uninformative about 9: the prior distribution 
of 9 is revised by the data only through the information brought by the identified parameter 
4>(F). Indeed, since 4>(F) is identified, it is straightforward to show that the posterior of 9 
conditional on (f)(F) satisfies 

p{9\<f>(F),D n )=ir{9\<f>(F)). 

(see Poirier (1998), who called the data to be conditionally uninformative for 9 given 0). 
Let p(F\D n ) denote the marginal posterior of F which, by abuse of notation, can be written 
p(F\D n ) oc 7r(F) nr=i F(Xi). The posterior distribution of 0, 0(0), S$(-) are deduced from 
the posterior of F. Then, for any measurable set Bc6, the marginal posterior probability 
of 9 is given by, averaging over F: 



P{9 G B\D r 



P(9eB\cf>(F),D n )p(F\D n )dF 

n(9 G B\4>{F))p{F\D n )dF = E[n(9 G B\<f>(F))\D r , 



(2.2) 



where the conditional expectation is taken with respect to the posterior of F . The corre- 
sponding marginal posterior density function of 9 will be denoted by p(9\D n ). 



2.2.2 Semi-parametric prior 

Alternatively, instead of modeling F nonparametrically, we could reformulate the model 
and parameterize the sampling distribution F in terms of a finite-dimensional parameter 
G $ and a nuisance parameter r] G V, where V is an infinite-dimensional measurable 
space. Therefore, F = {i 7 ^; G $, G V}. When we consider the frequentist properties of 
the posterior distribution, we assume there is a fixed true value for F, denoted by F . Since 
both and F are identified then there exist unique O £ $ and r] G V such that F = i^ 0jW , 
where O an d r] denote the true values of and rj. Denote by /„(0, rf) the model's likelihood 
function. 

One of the appealing features of this semi-parametric approach is that it allows us to 
impose a prior 7r(0) directly on the identified parameter 0, which is convenient whenever we 
have good prior information regarding 0. In contrast, a nonparametric prior specification 
may be inconvenient to incorporate subjective prior information. 

For instance, in the interval censored data example, we can write 

Yi — 4>x + Y 2 = 02 + v 
u ~ f u V ~ f 2 , 

where both u and v are random errors with zero mean and unknown density functions 
fi and / 2 such that u\\v\fi,f 2 and the supports of the corresponding distributions of 
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Yi, ^101, 02, fi-, f2 are disjoint^]. Then 77 = / 2 ), an d the likelihood function is 

n 



i=l 



We put priors on (0, / 1; / 2 ). This is a location-model studied for instance by Ghosal et al. 
(1999) and Amewou-Atisso et al. (2003). Examples of priors on density functions fi and / 2 
include mixture of Dirichlet process priors, Gaussian process priors, etc. 

The joint prior distribution 7r(6>,0, 77) is naturally decomposed as 

7r(M,77) = 7r(#|0)x7r(0,77). (2.3) 

We place an independent prior on (0, 77) as 7r(0, 77) = 7r(0)7r(?7). Therefore, the Bayesian 
experiment is 

X\<p,Ti ~ Ffa, (0,r/) ~ tt(0,77) = ?r(0) x 7r(r/), 0|0, 77 ~ 7r(0|0). 
The posterior distribution of has a density function given by 

P {<t>\D n ) OC [ 7r(0,77)Z n (0,77)d77. (2.4) 
JV 

Then the marginal posterior of 9 is, for any measurable set B G ©: 

P(0 G B|£) n ) oc f [ ir{6e B\4>)*{4>, 77)/ n (0,77)c?77#. (2.5) 







Moreover, the corresponding posterior density function is: p(9\D n ) = J $ 7r(6>|0)]9(0|Z} n )<i0. 
where p((p\D n ) has been defined in (12.41) . The posteriors of 0(0) and S^(-) are deduced from 
that of 0. Suppose for example we are interested in whether 0(0) fl A is an empty set for 
some A C 0, we then look at the posterior probability 

P(0(0) n A\D ) = -WwnA=0} U *(<t>Mv)U<t>, v)dyd<j ) 

f # / p 7r(0)7r(77)z n (0,77)d77d0 

The finite-dimensional posterior distribution of the support function S^-) is the distribution 
P(S ( / ) (pi) G Aj, for 1 < z < k), k G N, for every (p\, . . . ,pk) such that pi G E> d , i = 1, . . . , k, 
and for every product of measurable sets Ai in R. 



Example 2.4 (Interval regression model - continued). Consider Example 12.21 where 



2 In order to implement this we have two possibilities. Let [u, u] and [v, v] denote the supports of fi and 
fi and [</> , /pi) and [4> 2 , 4> 2 ] denote the supports of 4>\ and <f>2, respectively. First, we can specify a conditional 
prior 7r(/i, f2\4>i, 4>2) such that u + <p l < v + <p 2 . A second possibilities is to specify a independent prior on 
{fiifi) and on (</>i, </> 2 ) such that m < v and X < 2 - 
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>!, 2 ,0 3 ) = {E{ZY X ),E{ZX T \E{ZY 2 )). Write 



ZYi = ^x + Ui, ZY 2 = 4> 3 + u 3 , vec(ZX ) = vec(0 2 ) + «2, 

where Ui,u 2 and M3 are correlated and their joint unknown probability density function is 
77(^1, 1*2, W3). The likelihood function is then 

n 

l n {<j>,V) = Y[r]( Z i Y ii - 0i, z i y 2i ~ 03, vec(ZiXf) - vec(0 2 )). 

i=l 

□ 

Many nonparametric priors can be used for tt(t]) in the location-model of the type of 
Example 12.41 where Z n (0, 77) = niLi^C^ — 0)> or °f the type of the interval data example. 
The next examples show three possible ways for constructing priors 77(77) on probability 
density functions. 

Example 2.5. The finite mixture of normals (e.g., Lindsay and Basak (1993), Ray and 
Lindsay (2005)) assumes 77 to take the form 

k 

i=i 

where h(x; /Xj,£j) is the density of a multivariate normal distribution with mean and 
variance £j and {w^}f =1 are unknown weights such that ^2 i=1 Wifii = 0- Then J r](x)xdx = 
Y^i=i w i I h( x l ^i)xdx = 0. We impose prior 77(77) = 7r({/ij, £j, 7^}f =1 ), then 

p(0|D n ) oc / 7r(0)7r(77)TT77(X J -0)rf77 

J ? i=l 

/n A; 
7r(0) ^ Wjh(Xi - 0; Sy)7r({/i/, Sj, wi}i =1 )dwjdfijdT,j. 
~ 1 „■ 1 



i=\ j=i 



□ 



Example 2.6. Dirichlet mixture of normals (e.g., Ghosal, Ghosh and Ramamoorthi (1999) 
Ghosal and van der Vaart (2001), Amewou-Atisso, et al. (2003)) assumes 

rj( x ) = J h(x - z; 0, Y)dH(z) 

where h(x; 0, E) is the density of a multivariate normal distribution with mean zero and 
variance £ and if is a probability distribution such that J zH(z)dz = 0. Then J xrj(x)dx = 
0. To place a prior on 77, we let H have the Dirichlet process prior distribution D a = 
T^iyoi Qo) where a is a finite positive measure, vq = a(X) G IR+ and Qo = a/a(X) is a base 
probability on (X, Q3 X ) such that Qo(x) = 0, Vx G (X, *B X .). In addition, we place a prior on 
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E independent of the prior on H. Then 



p((f>\D n ) oc / ir((f>)n(E)D a (H) J[ / h(X { - - z; 0, Y,)dH{z)dY>dH. 

□ 

Example 2.7. Random Bernstein polynomials (e.g., Walker et al. (2007) and Ghosal (2001)) 
admits a density function 

k 

V (x) = Y} H (j/ k ) - H (0 - l)/k)]Be(x; j, k-j + 1), 

where Be(x; a, b) stands for the beta density with parameters a, b > and H is a random 
distribution function with prior distribution assumed to be a Dirichlet process. Moreover, 
the parameter k is also random with a prior distribution independent of the prior on H. 
Then p((j)\D n ) oc / vr(0) n™=i V&i ~ (j>)n(H)n(k)dHdk. □ 

Besides, other commonly used priors are wavelet expansions (Rivoirard and Rousseau 
(2012)), Polya tree priors (Lavine (1992)), Gaussian process priors (van der Vaart and van 
Zanten (2008), Castillo (2008)), etc. 



2.3 Interval censored data: an example 

For illustration purposes, we consider a simple version of the interval censored data 
example 12.11 where Y 2 = Y\ + 1 and only Y\ is observable, i.e. Y\ = X in our general 
notation. Let <fi = EY\ and 9 = EY, then the identified set is 0(0) = [<f),<j)+ 1]. Let 
F denote the marginal distribution of Y\. Then a more formal way to write should be 
<j> = <f>{F) = E(Y x \F). 

Let us specify a Dirichlet process prior for F: tc(F) = Pzr(z/ ,Qo) ; where v$ G M + and 
Qq is a base probability on (X, Q 3, T ) such that Qo(x) = 0, Vx G (X, %$ x ). By using the 



stick-breaking representation (see ISethuraman I (11994 ). 

the deduced prior distribution of 

the transformation 4>(F) is 

tt(0 eA) = P a £i G A^j , ViC$ 

where {£j}j>i denote independent drawings from Q , atj = VjY[j = x(i — v{) with {vi}i>\ 
independent drawings from a Beta distribution Be(l, Vq) and {vi}i>i are independent of 
{£j}j>i- The posterior distribution of F is still a Dirichlet process: p(F\D n ) = T>ir(v n ,Q„), 
with v n = u + n, and Q n = ^^Qo + ^+^-^5 where F is the empirical distribution of the 
sample (Yn, . . . , Yi n ). The posterior distribution of the transformation <f)(F) is 



P{cj> G A|D n ) = P [p PjYxj + (l-p)J2 afo G A 



D n \, VA c 
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D r . 



where p is drawn from a Beta distribution Be(n, u ) independently of the other quantities 
and (/3i, . . . , P n ) are drawn from a Dirichlet distribution with parameters (1, . . . , 1) on the 
simplex S n _i of dimension (n — 1). With the prior 7r(0|0) oc Ig E Q^g(9), the marginal 
posterior density function of 9 evaluated at some fixed 9 is 

p(0|£>„) oc J l(6e[<j>,4> + l))g(6)p(F\D n )dF 
= g (e)p (4(F) <B< 4(F) + i\d^ 

(n oo 
3=1 3=1 

The alternative semi-parametric prior can be formulated as follows. Define u = Y\ — 
0i, and assume u has a continuous density /. The likelihood is thus given by l n (<f>, f) = 
nr=i fO^i ~ ( t > )- We pl ace an y parametric prior 7r(0) on and a Dirichlet mixture of normals 
prior on /, which assumes f(u) = J h(u — z\ 0, a 2 )dH(z) where if is a probability measure 
that has a Dirichlet process prior D a and a 2 is a variance parameter for the normal mixtures 
that has an inverse Gamma prior (see Example I2.6l for details). We then obtain the posterior 

p(4>\D n )(x I ir((f))ir(<T 2 )D a (H) ]J I h(Y t - - z^,a 2 )dH(z)da 2 dH. 

•* i=l 

The marginal posterior density function of 9 evaluated at some fixed 9 is 

p(9\D n ) oc g(9)P(9 - 1< < 9\D n ). 



3 Posterior Consistency for 6 

In Bayesian analysis, one starts with a prior knowledge (sometimes uninformative) on 
the parameter and updates it according to the marginal posterior given the data. In classical 
point identified parametric and semi-parametric models, under mild assumptions the poste- 
rior is asymptotically normal due to the Bernstein von Mises theorem and hence its shape 
is not affected anymore asymptotically by the prior specification. In contrast, the shape of 
the posterior of a partially identified parameter still relies upon its prior distribution (see 
Poirier (1998)) even asymptotically. Only the support of the prior distribution of 9 (given 
0) is revised after data are observed and eventually converges towards the true identified 
set asymptotically. This corresponds to frequentist consistency of the posterior for partially 
identified parameters and is due to the fact that the point-identified parameter completely 
characterizes the support. 

We assume there is a true value of 0, denoted by O , which induces a true identified set 
0(00 ) and a true F, denoted by F Q . Our goal is to achieve the frequentist posterior consis- 
tency for the partially identified parameter: that is, for any e > there exists a r G (0, 1] 
such that 
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P{9 G Q((f)o) e \D n ) 1 and P{9 G 0(0o) _£ |^„) ~> p (1 - r) 

where 

6(0) e = G : d(0, 6(0)) < e} (3.1) 

is the e-envelope of 0(0) and 

9(0)- e = {0 G 0(0) : d(9, 0\0(0)) > e} (3.2) 

is the e-cont raction of 9(0) with 0\0(0) = {9 G 0; 9 £ 0(0)} and d(9, 0(0)) = inf xeew \\9- 
x\\, see e.g. Molchanov ( 20051 ) and Chernozhukov, Hong and Tamer (2007). Thus, poste- 



rior (or frequentist) consistency for a partially identified parameter means that the pos- 
terior distribution of 9 puts all its mass on a set whose boundaries belongs to the set 
{9 G 0; d{6\ <90(0 O )) < e} where <90(0 O ) denotes the boundary of 0(0o)- Posterior con- 
sistency is one of the benchmarks of a Bayesian procedure under consideration, which en- 
sures that with a sufficiently large amount of data, it is nearly possible to discover the 
truth identified set. Therefore lack of consistency is extremely undesirable. Liao and Jiang 
(2010, 2011) studies the posterior consistency for partially identified models, however, with 
a pseudo likelihood function whose probabilistic interpretation is still in question^]. More 
recently, Kitagawa (2011) considered the posterior consistency for 0(0) in terms of the pos- 
terior lower probability when the parametric form of the likelihood is known. 
We recall that the conditional prior on 9 (given 0) is specified as 

tt(0|0) oc g(9)I ee e W (3.3) 

for some g(9). In the special case where 9 is point identified, then {9} = 0(0) becomes a 
function of 0, whose prior is completely determined by that of instead of by (13. 3 p . 

In this section we focus on the frequentist consistency of the marginal posterior of 9 
(marginalized with respect to the posterior of 0). We will investigate the posterior concen- 
tration rate of 0(0) and S^(-) in subsequent sections. For a measurable set B C 0, the 
marginal posterior probability is given by (J272]): 

p(9 G B\D n ) = f A0e B\<t>(F))p(F\D n )dF 
when the prior on is induced by the nonparametric prior specified on F, and by (12. 5p : 

p(9 G B\D n ) = f tt(0 G B\<j>)p(<i>\D n )d<f> 

when the prior on is specified through a semi-parametric prior as described in section 
12.2.21 Recall that F and are point-identified and frequentist asymptotic properties of the 
marginal posterior of 9 rely on frequentist asymptotic properties of the posterior of F and 
0. Therefore, we assume that the priors tc(F) and vr(0) specified for F and are such that 
the corresponding posterior distributions are consistent: 



3 See Schennach (2005) for discussions of probabilistic interpretations of pseudo likelihood functions. 
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Assumption 3.1. At least one of the following holds: 



(i). The measurable function : F — > $ is continuous and the prior n(F) is such that the 
posterior p(F\D n ) satisfies: 

m{F)p{F\D n )dF -> p J m(F)5 Fo (dF) 

for any bounded and continuous function m(-) on F where 5 is the Dirac function, and 
F is the true distribution function of X; 

(ii). the prior 7r(0) is such that the posterior p((f>\D n ) satisfies: 



m(<P)p(,<P\D n )d<j> / m(0)^ o (#) 

for any bounded and continuous function m(-) on $. 

Assumptions 13.11 (i) and (ii) correspond to the nonparametric and semi-parametric 
prior, respectively and are verified by many nonparametric and semi-parametric priors. Ex- 
amples are: Diri chlet process priors, Polya Tree p rocess priors, Gaussian process priors, 



etc. We refer to iGhosh and Ramamoorthi I (120031 ) for examples and sufficient conditions 



for posterior consistency. For instance, when ir(F) is the Dirichlet process prior, the sec- 
ond part of Assumption 13.11 (i) was proved in Ghosh and Ramamoorthi (2003, Theorem 
3.2.7). The condition that <f>(F) is continuous in F is verified in many examples rele- 
vant for applications. For instance, in example I2.1[ 4>{F) = E(Y\F) and in example 12.21 
(f)(F) = (E(ZY 1 \F),E(ZX T \F),E(ZY 2 \F)), which are all linear functionals of F. 



Assumption 3.2. For any e > there are measurable sets A\,Ai C $ such that 
< vr(0 G Ai) < I, i = 1, 2 and 

(i) for all (j) e A 1; 9(0o) e H 6(0) ^ 0; for all g A 1; 6(0 o ) e n 6(0) = 0, 

(ii) for all e A 2 , 6(0 o )^ e n 6(0) ^ 0; for all £ A 2; 6(0o)" e n 6(0) = 0. 

This assumption allows us to prove the posterior consistency without assuming the prior 
7r(0|0) to be a continuous function of 0, and therefore priors like J 9 i 1 <e<</> 2 in the interval 
censoring data example are allowed. Under this assumption and if the conditional prior 
7r(#|0) is a regular conditional distribution, the conditional prior probability of the e-envelope 
(and of the e-contraction) of the identified set can be approximated by a continuous function, 
i.e., there is a sequence of bounded and continuous functions h m {<p) such that (see lemma 
IC.ll in the appendix) almost surely in 0: 

n(6 G 6(0 o ) e |0) = lim h m (<f>). 

A similar approximation holds for the conditional prior of the e-contraction ir(8 G 6(0o) _e |0). 
Assumption 13.21 is satisfied as long as the identified set 6(0) is compact and the prior of 
is spread over a large support of the parameter space. 
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Assumption 3.3. For any e > 0, and G $, 7r(# G 6(0) e |0) < 1. 

This is an assumption on the prior for 9, which means the identified set should be sharp 
with respect to the prior information. Roughly speaking, the support of the prior should 
not be a proper subset of any e-contraction of the identified set 0(0). If otherwise the 
prior information restricts 9 to be inside a strict subset of 0(0) so that Assumption 13.31 is 
violated, then that prior information should be taken into account and we should shrink 
0(0) to a sharper set. In the special case when 9 is point identified (0(0) is a singleton), 
the e-contraction is empty and thus n(9 G ©(0)~ e |0) = 0. 

The following theorem gives the posterior consistency for partially identified parameters. 

Theorem 3.1. Let 7r(9\(fi) be a regular conditional distribution. Under assumptions 1 3. 1§373\ 
for any e > 0, there is r G (0, 1] such that 

P{9 G e(0 o ) e |A*) ^ p 1 and P{9 G e(0 o )" e |AO ~> P (1 - r). 

4 Posterior consistency for 0(0) 

Let 0o be the true value of 0, which corresponds to the true identified set O(0q). The 
estimation accuracy of the identified set is often measured, in the literature, by the Hausdorff 
distance. Specifically, for a point a and a set A, let d(a,A) = inf^g^ \\a — x\\, where || • || 
denotes the Euclidean norm. The Hausdorff distance between sets A and B is defined as 

dn(A, B) = max < sup d(a, B), sup d(b, A) > = max < sup inf \\a — b\\, sup inf \\b — a\\> . 

laeA b£B J laeAb^B beB aeA J 

It follows immediately that dff{A,B) = dji{B,A) and when both A and B are compact, 
dn{A,B) = if and only if A = B. This section aims at deriving the rate r n = o(l) such 
that for some constant C > 0, 

P(^(6(0),e(0 o )) < Cr n \D n ) 1. 

The above result is based upon the posterior concentration rate for - in the sense that 
r n is the same as the concentration rate for - as well as some continuity condition on 
^ff(@(0), @(0o)) with respect to 0. 

In a semi-parametric Bayesian model where is point identified and either a nonpara- 
metric or a semi-parametric prior is placed, the posterior of achieves a near-parametric 
concentration rate under proper conditions on the prior. Since our goal is to study the pos- 
terior of 0(0) and 9 instead of 0, we state a high level assumption on the posterior of as 
follows instead of deriving it from more general conditions. More formal derivations of this 
assumption will be presented in appendix IB1 

Assumption 4.1. The marginal posterior of is such that 

P{U-M < Crr l / 2 {lognf' 2 \D n ) 1. 
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This assumption is imposed for both kinds of priors described in Section 2, and is a 
standard result in semi-nonparametric Bayesian literature. If we place a nonparametric 
prior on F as described in Section 2.2.1, this assumption becomes 

P(||0(F) -0(Fo)|| < Cn-^ 2 {\ogn) l l 2 \D n ) -> p 1. 

Primitive conditions for the validity of this case can be found in a recent work by Rivoirard 
and Rousseau (2012). On the other hand, if we parametrize the model in J 7 = {F^^ : 
G G V} as described in Section 2.2.2, with rj being an infinite-di mensional nuisance 



param eter, a sufficient condition for Assumption 14.11 is found in both iBickel and Kleijn 



( 120121 ) and the appendix of this paper. 

Instead of assuming continuity of <i#(0(0), 0(0o)) with respect to 0, which is sufficient 
in order to get the concentration rate of 0(0), we place less demanding assumption that still 
allow us to get the concentration rate. With this aim, we consider a more specific partially 
identified model: the moment inequality model, which assumes that 9 satisfies k moment 
restrictions: 

*(0, 0) < 0, 0) = 0), * fc (0, 0)) T (4.1) 

where : x $ — > M fc is a known function of (9, 0). The model depends on the data X via 
the point identified parameter 0. In the moment inequality model, the identified set can be 
characterized as: 

6(0) = {9 G : *(0,0) < 0}. (4.2) 

Since most of the partially identified models can be characterized as moment inequality mod- 
els, model ( I4.ip -( 14"U|) has received extensive attention in the partially identified literature. 

Assumption 4.2. The parameter space x $ is compact. 

Assumption 4.3. {^(9, •) : 9 G 0} is Lipschitz equi- continuous on $, that is, for some 
K > 0, V0!,0 2 G $, 

SUp ||*(0,0l) - * (M 2 ) || < #1101-0211- 

eee 

Given the compactness of 0, this assumption is satisfied by many interesting examples 
of moment inequality models. 

Assumption 4.4. There exists a closed neighborhood U(<p ) of O , such that for any a n = 
0(1), and any G U(4> ), there exists > that might depend on <f>, 

inf max ^A9, 0) > a n . 

0:d(0,e(</>))>C> n i<fc 

Intuitively, when 9 is bounded away from 0(0) (up to a rate a n ), at least one of the mo- 
ment inequalities is violated, which means maxj<fc ^i(9, 0) > 0. This assumption quantifies 
how much maxj<fc ^i(9, 0) will depart from zero. This is a sufficient condition for the partial 
identification condition in Chernozhukov, Hong and Tamer (2007). If we define 



Q(M) = ||max(*(M),0)|| 



^(maxO^M^O)) 1 



i=l 



1/2 
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then Q(9, 0) = if and only if 9 G 0(0). The partial identification condition in Chernozhukov 
et al. (2007, Condition (4.5)) assumes that there exists K > so that for all 9, 



Q(9,cf>)>Kd(9,e(4>)), (4.3) 

which says Q should be bounded below by a number proportional to the distance from the 
identified set if 9 is bounded away from the identified set. Assumption 14.41 is a sufficient 
condition for (14. 3p . 

Example 4.1 (Interval censored data - continued) . In the interval censoring data example, 
#(0,0) = {0~ 02,0i - 9) T } for any = (0 1; 2 ) and = (0 1; 2 ), ||#(0,0) - *(M)|| = 
||0 — 0|| . This verifies Assumption 14.31 Moreover, for any 9 such that d(9, 0(0)) > a n , either 
9 < 0i — a n or > 02 + a n . If < 0i — a n , then ^2{9, 0) = 0i — 9 > a n ; if 6* > 02 + a n , then 
\l/i(#, 0) = — 02 > a n - This verifies Assumption 14.41 □ 

The following theorem shows the concentration rate for the identified set. 

Theorem 4.1. Under Assumptions \4 ■ 1^4 ■ 4\ f or some C > 0, 



P(^(O(0),O(0o)) > Cn-^ilogn) 1 ' 2 ^) ^ 0. 

Remark 4.1. The above result holds for both nonparametric prior <f)(F) and semi-parametric 
prior (0, 7]) as described in Section 2. The concentration rate is nearly parametric: n~ l l 2 (\ogn) 1 / 2 . 
The term y/logn arises commonly in the posterior concentration rate literature. The poste- 
rior probability in the theorem is now converging to zero, instead of only being smaller than 
an arbitrarily small constant. Same rate of convergence in the frequentist perspective has 
been achieved by Chernozhukov et al. (2007), Beresteanu and Molinari (2008), Kaido and 
Santos (2011), among others. 

Remark 4.2. Recently Kitagawa (2012) obtained the posterior consistency for 0(0): for 
any e > 0, 

lim P(d H (©(0),©(0 o )) >e\D n ) = 

n— >oa 

for almost every sampling sequence of D n . This result was obtained for the case where 9 is a 
scalar whose identified set 0(0) is a connected interval and <i^(O(0), O(0 O )) is assumed to 
be a continuous map of 0. In multi-dimensional cases where 0(0) is a more general convex 
set, however, verifying the continuity of <i#(©(0), O(0o)) is much more technically involved, 
due to the challenge of computing the Hausdorff distance in mult i- dimensional minifolds. In 
contrast, our Lipschitz equi-continuity condition in Assumption 14.31 is much easier to verify 
in specific examples, as it depends on the moment conditions directly. 



5 Bayesian Inference of Support Function 

This section develops Bayesian inference for the support function S^p) of the identified 
set 0(0) in the moment inequality model fl4.ip - fl4.3l) . Bayesian inference for the support 
function has two main interests. First, it provides an alternative way to perform estimation 
of the identified set 0(0). Second, it allows us to construct a two-sided BCS for 0(0) in the 
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next section. In this section, we first develop an asymptotically valid linearization in of the 
support function. Based on this result we show that posterior consistency can be achieved 
and prove the Bernstein von Mises theorem for the support function. 

5.1 Moment Inequality Model 

Our analysis focuses on identified sets which are closed and convex. These sets are com- 
pletely determined by their support functions, and efficient estimation of support functions 
may lead to optimality of estimation and inference of the identified set. As a result, much 
of the new development in the partially identified literature focuses on the support function, 
e.g., Kaido and Santos (2011), Kaido (2012), Beresteanu and Molinari (2008), Bontemps, 
Magnac and Maurin (2012). 

In the moment inequality model, 9(0) := {9 G 6; 0) < 0}, where 0) is given 
in (14. ip and each component of ^(9, 0) is a convex function of 9 for every G $ as stated 
in the next assumption. 

Assumption 5.1. ^(#,0) is continuous in (6,<f)) and convex in 9 for every G 3>. 

Let us consider the support function S^(-) : S d — > R of the identified set 0(0). We restrict 
its domain to the unit sphere § d in M. d since S ( f ) (p) is positively homogeneous in p. Under 
assumption 15.11 the support function is the optimal value of an ordinary convex program: 

SM=sup{p r e; M>(£,0)<O} 

and therefore it also admits a Lagrangian representation (see Rockafellar, chapter 28): 

S+ip) = sup{p T 9 - \{p, 0) T M>(fl, 0)}, (5.1) 
eee 

where X(p, 0) : S d x M. ^ — > is a k- vector of Lagrange multipliers. Note that is the 
dimension of 0. 

We denote by ^/s(9,(po) the fc^-subvector containing the constraints that are strictly 
convex functions of 9 and by \I/l(#,0o) the ki constraints that are linear in 9. Obviously, 
ks + ki = k. The corresponding Lagrange multipliers are denoted by Xs{p, 0o) and Xl{p, 0o), 
respectively, for p G B> d . Moreover, define E(p, 0) = dJigmax S£e {p T 9; \I/(6',0) < 0} as the 
support set of 0(0). Then, by definition, 

p T 9 = S (P (p), V0GS(p,0). 

We also denote by V^(9, 0) the k x d^ matrix of partial derivatives of \1/ with respect to 
0. Let -B(0o, 5) = {0 G $; ||0 — 0o|| < denote a closed ball centered at O with radius 5. 
For every G B((j) Q ,r n ) and 9 G 0(0), we denote by Act(9, 0) := {z; ^i(9,(f>) = 0} the set 
of the inequality active constraint indices and by ^(^,0) the number of its elements. For 
every i G Act(9, 0), Ve^i(9, 0) denotes the d- vector of partial derivatives of ^/i with respect 
to 9. We assume the following: 

Assumption 5.2. The true value O is in the interior of $, and O is convex and compact. 
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Assumption 5.3. There is some 5 > such that for all G B((f) , 5), we have: 

(i) the k x (1$ matrix V^,\l/(6!,0) exists and is continuous in (9,4>); 

(ii) the set 0(0) is non empty; 

(Hi) there exists a 9 G 6 such that ^(0,0) < 0; 

(iv) 0(0) C int(Q) where int(Q) denotes the interior ofQ; 

(v) for every i G Act(9,(j) ), with 9 G O(0o), the vector Ve^i(9,(p) exists and is continuous 
in (0,0) G 6 x 5(0 O ,5). 

Assumption 15.31 (iii) implies assumption 15.31 (ii). However, we prefer to keep both condi- 
tions since in order to establish some technical results we only need condition (ii) which is 
weaker. 

The next assumption concerns the inequality active constraints. In particular, assumption 
15.41 (i) may be restrictive in the one dimensional case (i.e. d = 1) but is easily verified in 
the cases with d > 1. For instance, in example 12.11 this assumption is not verified in the 
degenerate case where 0i = 02- Assumption I5.4l (ii) says that the active inequality constraint 
gradients Vg^i^, 0o) are linear independent. This assumption guarantees that a 9 which 
solves the optimization problem ( 15. ip with = O satisfies the Kuhn- Tucker conditions. 
Alternative assumptions that are weaker than assumption 15.41 (ii) could be used, but the 
advantage of assumption 15.41 (ii) is that it is easy to check. 

Assumption 5.4. (i) 6^(6?, 0o) < d for every 9 G 6(0o) where <i^(^,0o) is the number of 
active constraints; 

(ii) the gradient vectors {Ve^i{9, 0)};eArf(0,0 o )? are linearly independent V '9 G 0(0o)- 
The following assumption is sufficient for the differentiability of the support function at 

00 : 

Assumption 5.5. At least one of the following holds: 

(i) For the ball B((p ,5) in Assumption \5.3\ for every (p, 0) G § d x B((pQ,5), H(p, 0) is a 
singleton; 

(ii) There are linear constraints in ^(9, 0o), which are also separable in 9, that is, ^l{9, 0o) = 
A\9 + ^2(0o) for some function A2 : $ — > M. hL (not necessarily linear) and some 
(k^ x d) -matrix Ax. 

Assumption 15.51 is particularly important for the linearization of the support function 
that we develop in section [5T2l In fact, if one of the two parts of Assumption 15.51 holds then 
the support function is differentiable for every (p, 0) G S d x £?(0 O ,<5) and we have a closed 
form for its derivative. 

The last set of assumptions that we introduce will be used to prove the Bernstein von 
Mises theorem for S^(-) and allows to strengthen the result of theorem 15.11 below. The first 
three assumptions are (local) Lipschitz equi-continuity assumptions. 

Assumption 5.6. For the ball B(<pQ,5) in assumption \5.3\ for some K\, K 2 , K 3 > and 

V0 1 ,0 2 GS(0 O ,5): 
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(i) sup peSd ||A(p,0i) - A(p,0 2 )|| < KxWfa -0 2 ||; 

(ii) sup^eHV^^O-V^,^)!! <^ 2 ||0i-0 2 ||; 

(Hi) WV^iOxM) - V^(0 2 ,0o)|| < # 3 ||0i -0 a ||, /or every 9 h 9 2 G 0; 

(ui^ IfE{p,4>o) is a singleton Vp £ W for some compact subset W C § d t/ien t/iere exists a 
e = 0(5) such that E(p, 0i) Q 2 e (p, 0o). 

We show in the following example that Assumptions I5.1H5.6I are easily satisfied 

Example 5.1 (Interval censored data - continued). The setup is the same as in Example 12. II 
Assumption 15.21 is verified if Y\ and Y 2 are two random variables with finite first moments 
Oi i and 0o,2) respectively. Moreover, 0) = (0i — 9,9 — (p 2 ) T , 4> — (0i)02) T , 

so that Assumptions 15. 1[ 15.21 and 15.31 (i)-(ii) are trivially satisfied. Assumption 15.31 ( Hi ) holds 
for every 9 inside (0i,0 2 ); Assumption 15.31 (iv) is satisfied if 0i and 2 are bounded. To see 
that Assumptions 15.31 (v) and 15.41 are satisfied note that V# < O) i we have Act(9, O ) = 
{1}, V6» > 0o,2 we have Act(9, O ) = {2} while V6» G [0 o ,i,0o, 2 ] we have Act(9, O ) = 0- 
Assumption 15.51 and (ii) are both satisfied since the support set takes the values S(l, 0) = 
2 and S(— 1,0) = —0i and the constraints in \l/(#,0o) are both linear with A\ = (—1, 1) T 
and A 2 (0 O ) = V^(0,0 o )0o- 

In order to verify assumption 15.61 we use the largest eigenvalue as the matrix norm. The 
eigenvalues of V < /,\I/(6 I , 0) are {1, —1} for every 9 and 0. Hence, assumptions 15.61 (ii)-(iii) are 
verified. The lagrange multiplier is X(p, 0) = (—pl(p < 0),pl(p > 0)) T so that assumption 
15.61 (%) is satisfied since the norm is equal to 0. Finally, the support set H(p, 0) = <fiil(p < 
0) + 02-f (p > 0) is a singleton for every G £>(0o, 5) and o ) e = {9 G ©; ||0 — 9* \\ < e} 
where 9* = S(p, O ) = 0o,i/(p < 0) + (f> Q2 I(p > 0). Therefore, 0) — 9*\\ < 5 and 

assumption 15.61 (iv) holds with e — 8. □ 



5.2 Asymptotic Analysis 

The support function of a closed and con vex set is in general non-d ifferentiable in p 



but it admits directional derivatives, see e.g. iMilgrom and Segal I ( 120021 ). Luckily, when 
assumption 15.51 holds the derivative of the support function exists. We exploit this fact to 
derive an expansion in of the support function. This allows us to establish a Bernstein- von 
Mises type result for the posterior distribution of the support function. 

The next theorem states that the support function can be locally approximated (asymp- 
totically) by a linear function of 0i,02 G B((f> ,r n ) for r n = o(l) a bounded sequence de- 
pending on the sample size n. The expansion is stochastic when is interpreted as a random 
variable associated with the posterior distribution P(-\D n ). 

Theorem 5.1. Let 9*(p) : E> d — > O be a Borel measurable mapping satisfying 9 Jf (p) G S(p, O ) 
for all p G E> d . If assumptions \5.1W5.5\ hold with 5 = r n for some r n = o(l), then there 
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is a N such that for every n > N there exist: (i) a real function /(0i,02) defined for 
every 0i,02 £ B(<f) ,r n ) and (ii) a function \(p, 0o) : § d x — > siic/i t/icrf /or every 
01,02 G 5(0 O , r n ): 

sup |(^(p) - S 02 (p)) - A(p,0 o ) T V^(^(p),0o)[0i - 2 ]| = /(0i,0 2 ) 

an£ ^ ffl'^ii — * uniformly in 0i, 2 G -B(0o, r n ) as n — )■ oo. 

We remark that the functions A and do not depend on the specific choice of 0i and 
02 inside B(<f) ,r n ), but only on p and the true value 0o- With the approximation given in 
the theorem we are now ready to state posterior consistency (with concentration rate) and 
asymptotic normality of the posterior distribution of S^{p). In the next theorems we will set 
r n = (logn) 1 / 2 ^ -1 / 2 when n is sufficiently large. 

Theorem 5.2. Under assumption J^.l and the assumptions of Theorem I5.il with r n = 
\/ (logn)/n ; for some C > 0, 

P(sup \SM - SMI < C(logn) 1 / 2 n- 1 / 2 | J D n ) 1. (5.2) 

Remark 5.1. Notice that 0^(0(0), 6(0o)) = sup pg gd \S$(p) — S^ Q (p)\. Therefore, (15.21) is 
another statement of Theorem 14.11 However, they are obtained by different proof strategies. 
In particular, Theorem 15.21 is obtained as a byproduct of the Bernstein- von Mises theorem 
stated in theorem 15.31 below and is based on the asymptotic local expansion of the support 
function as in theorem 15.11 As will be shown below, this expansion also yields the Bernstein 
von Mises theorem of the support function, that is, the posterior of the support function is 
asymptotically normal. 

We now state a Bernstein-von Mises (BvM) theorem for the support function. This 
theorem is valid under the assumption that a Bernstein-von Mises (BvM) theorem holds for 
the posterior distribution of the finite-dimensional identified parameter 0. We denote by 
|| ■ || tv the total variation distance, that is, for two probability measures P and Q, 

\\P - Q\\ TV := sup \P(B) - Q(B)\ 

B 

where B is an element of the a-algebra on which P and Q are defined. 

Assumption 5.7. Let P^,u-^ )\n„ denote the posterior distribution of \/n(<p — 0o). We 
assume 

\\P^/Fi(4>-M\d„ —NdfiA^foilfo )\\tv 

where A/^ denotes the d$- dimensional normal distribution, A n ^ := n" 1 / 2 X^iLi ^o^oPQ; 
1^ is the semi-parametric efficient score function of the model and 1^ denotes the semi- 
parametric efficient information matrix. 

For pri mitive conditions fo r the validi ty of this assumption in semi-pa rametric models 
we refer to iBickel and Kleijn I (120121 ) and iRivoirard and Rousseau I (120121 ) . Despite of the 
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notation, remark that and J^ also depend either on the true % or on the true F - de- 
pending whether the model has been re-parameterized or not. The semi-parametric efficient 
score function and the semi-parametric efficient information contribute to the stochastic 
local asymptotic normality (LAN) expansion of the integrated likelihood, which is necessary 
in order to get the BvM result in assumption 15.71 A precise definition of Z^ and J^ may be 



found in Ivan der Vaart I (120021 ) (Definition 2.15). 



Theorem 5.3. If the assumptions of Theorem \5.1\ and assumption \ 5. (ft hold with 5 = r r , 
\J (log n)/n, under assumption \5. 7 : 



\\P^snp pead (s^p)-s^ (p))\D n -^(A^ , I^)\\tv ^ p 
where A„^ = X(p, <p ) T V^(d*{p), <f) )A n ^ and 

l£ = A(p,0o) T V^(^(p),0o)4 1 V^(^(p),0o) T A(p,0 o ). 

The asymptotic mean and covariance matrix may be easily estimated by replacing <po by any 
consistent estimator <fi of 4>o- So that 6*{p) will be replaced by an element 9*(p) G H(p, cf>) and 
an estimate of \{p, </>o) will be obtained by solving - eventually numerically - the ordinary 
convex program in (15.11) with 0o replaced by </>. 

Remark 5.2. The posterior asymptotic variance of the support function 17 is the same as 
that of the frequentist estimator obtained by Kaido and Santos (2012, Theorem 3.2), and 
both are derived based on a linear expansion of the support function. On one hand, the 
linear expansion of Theorem 15.11 is obtained from expanding ty(6*(p), <fr) — O ) in a 

neighborhood of </> . This gives the asymptotic variance V</,\l/ (#*(£>), 4>q)I^V ^{O^ip), 4>o) T , 
which is semi-parametric efficient for estimating ^/(6 Jf (p), <j) ) as guaranteed by the Bernstein 
von Mises theorem proved by Bickel and Kleijn (2012). On the other hand, Kaido and Santos 
(2012) 's frequentist estimator of the support function has a linear representation in terms of 
^(@*(p)) — 4>o), where ^/(^(p)) is a sample analog of \1/(6'*(6 I ), O ) and is therefore 

semi-parametric efficient. This implies that the asymptotic variances of the support function 
from both Bayesian and frequentist approaches are the same. 

The asymptotic normality of the posterior of (p) also implies that the posterior coverage 
and the coverage under the limiting normal of our two-sided BCS for the identified set - that 
we construct in the next section - are the same. 



6 Bayesian Credible Sets 

In this section we focus on two kinds of credible sets: credible sets for 9 and credible sets 
for the identified set ©(</>). 
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6.1 Credible set for 6 



Bayesian inference on 9 can be carried out through finite-sample Bayesian credible sets 
(BCS). A BCS is a set BCS(r) such that 

P(9 G BCS(r)\D n ) = 1 — r (6.1) 

at level 1 — r, for r G (0, 1). Apparently such a definition is not unique. One of the popular 
choice of the credible set is the highest-probability-density (HPD) set, which has been widely 
used in empirical studies and also used in the Bayesian partially identified literature e.g., 
Moon and Schorfheide (2012) and Norets and Tang (2012). 

The BCS then can be compared with the frequentist confidence set (FCS). Let Pd„(-) de- 
note the probability measure based on the sampling distribution, where (9, 0, 77) = (9 , O , t] ) 
or (9, F) = (9o,F ). A frequentist confidence set FCS(r) for #0 satisfies 

lim inf inf P Dn (9 G FCS(r)) > 1 - t. 

n->oo </)£$ 6o£Q(<f>) 

There have been various procedures proposed in the frequentist literature to construct 
FCS(r) that satisfies the above inequality. One of the key properties of these proposed 
FCS is that they are based on some consistent estimator of <po, and 0(0) C FCS(r). Moon 
and Schorfheide (2012) compared HPD with FCS and showed that in a parametric Bayesian 
model with known likelihood, for any r > 0, P(9 G HPD(r),# £ FCS(r)|D n ) = o p (l), that 
is, the FCS is too large to do Bayesian inference. Under the more robust semi-parametric 
Bayesian setup, the frequntist confidence set is also "too big" from the Bayesian point of 
view, shown by Theorem 16.11 below. 
The following assumption is needed. 

Assumption 6.1. (i) The frequentist FCS(r) is such that, there is with \\<f> — 0o|| = o p (l) 
satisfying 0(0) C FCS(t). 
(ii) sup ( ^ )eex$ 7r(#|0) < 00. 

Many frequentist FCS's satisfy condition (i), see, e.g., Imbens and Manski (2004), Cher- 
nozhukov, Hong and Tamer (2007), Rosen (2008), Andrews and Soares (2010), etc. Condition 
(ii) is easy to verify since O x $ is compact. Examples of ^(^10) include: the uniform prior 
with density 

7r(^|0) = / i(O(0))- 1 / ee0W , 
where /x(-) denotes the Lebesgue measure; and the truncated normal prior with density 



7T(0|0) 



n -1 



h(x; A, S)dx 
e«0 



h(9;X,J:)I ee e 



(</>)' 



where h(x; A, S) is the density function of multinormal N(X, £). 

Theorem 6.1. Under Assumptions ^- 1\ the assumptions of theorem \5 . l\ with r„ = \/ (\ogn)/n, 



and \6.1\ for any t > 0, 

(i) 

P(0 i FCS{r)\D n ) = o p (l) 
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(ii) 



P(6 E FCS(t),9£ BCS(r)\D n ) ^ p r. 



Remark 6.1. Theorem 16.11 (i) shows that the posterior probability that 9 lies inside the 
frequentist confidence set is arbitrarily close to one, as n — > oo. This indicates that the FCS 
is too big to do insightful statistical inference from the Bayesian point of view. On the other 
hand, (ii) demonstrates that with a nonnegligible probability, FCS is strictly larger than 
BCS. Therefore, FCS is conservative from a Bayesian perspective. 

Remark 6.2. Similar results have been shown by Moon and Schorfheide (2012) when HPD 
is used as the Bayesian credible set. The result presented here, besides allowing a semi- 
parametric likelihood function, is more general. Our proof for part (i) is slightly different 
from the proof in Moon and Schorfheide (2012, Corollary 1), in that we rely on the continuity 
of d(O(0), O(0o)) with respect to 0, and is achieved through an asymptotic expansion of the 
support function. The proof for part (ii) follows the same argument of Moon and Schorfheide 
(2012)'s. 

6.2 Two-sided credible set for 0(0) 

We now construct an asymptotic valid BCS for 0(0). We are aiming at constructing 
two-sided credible sets A\ and A 2 such that 



with probability approaching one. The one-sided set A 2 is easy to obtain. As suggested in 
an earlier circulated version of Moon and Schorfheide (2012) and Norets and Tang (2012), 
suppose BCS^t) is a 1 — r Bayesian credible set of 0, then it is easy to show that 



for every sampling sequence D n . However, it is difficult to extend the idea of using the BCS 
of to construct the two-sided sets, more specifically, to construct the lower set A\. In this 
section, we apply a new idea, with the help of the support function for such a task. To our 
best knowledge, this is the first in the literature that constructs the two-sided BCS for 0(0). 

To illustrate why support function can help, for 0(0), recall its e-envelope as O(0) e = 
{6 E : d(9,e(<f>)) < e}, and e-contraction as 0(0)" e = {6 E 0(0) : d(fl,0\0(0)) > e} 
where e > and 0\0(0) = {6 E : 9 <£ 0(0)} as in fl£T) and If 0(00, 0(0 2 ) e and 

O(03)~ e are convex, for some 0i, 02 and 03 E $, then we have: 



P{A, C Q(^) C A 2 \D n ) > 1 - r 




|J Q(x) D n = 1 - r 




0(0i) C 0(02) e if and only if sup (S^p) 

IMI=i 



and 



0(03) e C 0(0i) if and only if sup (S ( j >3 (p) 

lbll=i 
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Let 4>m be the posterior mode, that is, <f>M — argmaxp(0|D n ). Then for any c n > 0, 

P(6(0 M )- C " c 6(0) C e(4> M ) Cn \D n ) = P{ sup \S eW (p) ~ Se$ M )(p)\ < c n\D n ). 

Ibll=i 

Note that the right hand side of the above equation depends on the posterior of the support 
function. The posterior mode is only an example, we point out that any consistent estimator 
could be used to construct the two-sided credible region. Let q T be the 1 — r quantile of the 
posterior of 

J{4>) = Vn sup \S^{p) - S^ M {p)\ 



so that 



P J(0) < q 7 



D n )=l-r. (6.2) 



The posterior of J {(f)) is determined by that of 0. Hence q T can be simulated from the MCMC 
draws of p{8\D n ). Immediately, we have the following theorem: 

Theorem 6.2. Suppose for any r 6 (0, 1), q T is defined as in A6.2fy . then for every sampling 
sequence D n , 

P(9(0m)-^ c (0) C e{^ M y^\D n ) = 1 - r. 

Remark 6.3. It is straightforward to construct the one-sided BCS for 6(0) using the de- 
scribed procedure. For example, let q T and q T be such that P{sup^ =1 {S ( f > {p) — Si(p)) < 
q T \D n ) = 1 - r, and P(sup w=1 (S^ M Go) - S^{p)) < q T \D n ) = 1 - r, then P(6(0) C 
6(0 M ) 9 " T/v/H |Pn) = 1 - r and P{Q{j) M )-^/^ c 0(0) |D n ) = 1 - r for every sampling 
sequence D n . 



6.3 Frequent ist coverage probability of BCS for 0(0) 

As we have shown in theorem 16.11 the BCS for 9 does not have a correct frequentist 
coverage when 9 is partially identified, since the BCS tends to be a subset of the interior of 
FCS. Gustafson (2012) showed that from a frequentist point of view, there is always a region 
of the identified set which Bayesian credible interval fails to cover. 

In contrast, the constructed two-sided BCS for the identified set has desired frequentist 
properties. Recently, Kitagawa (2012) constructed a one-sided credible set that also has a 
correct frequentist probability when 0(0) is an one dimensional interval for a scalar. The 
frequentist coverage probability for a more general multi-dimensional BCS have been largely 
unknown in the literature before. Our two-sided BCS is constructed based on the support 
function, for which the Bernstein von Mises Theorem holds (see Theorem l5.3l) in the moment 
inequality model, which implies that the frequentist coverage probability is asymptotically 
correct. We show this in theorem 16.31 below. 

The analysis relies on the following assumption, which requires the asymptotic normality 
of the posterior mode of (or of the consistent estimation used to construct the BCS). 
The asymptotic normality of posteriors modes has been long realized, and holds under mild 
conditions. 
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Assumption 6.2. The posterior mode <\>m is such that 



where denotes the semi-parametric efficient information matrix as in Assumption \5.7 

Theorem 6.3. Consider the moment inequality model in j4-l\j - If assumptions 15. iH5. (A 
hold with 5 = r n = a/ (log n)/n then the constructed two-sided Bayesian credible set has 
asymptotically correct frequentist coverage probability, that is, 



M4>Mr qr/VE c e(0 o ) c e(4> M y^) > 1 - T + 0p (i)E 



Similarly we can show that the one-sided BCS's as constructed in Remark 16.31 also have 
asymptotically correct coverage probabilities. For example, for q T such that P(swp^ =1 (S^(p)- 
S^ M (p)) <q~r\D n ) = l- T, then 



PD„(e(0 o ) C 0(<M WVn ) > 1 - t + o„(l). (6.3) 

Remark 6.4. Our BCS is constructed based on the support function, whose frequentist 
coverage probability is guaranteed by the Bernsten von Mises theorem of the support func- 
tion, proved in Theorem 15.31 Since the normal distribution is also the limiting distribution 
for efficient frequentist inference about the support function (see Kaido and Santos 2011), 
our two-sided BCS can be interpreted as asymptotically efficient confidence region for the 
identified set@ 



We can also use Q(4>M) qT '^ n as the frequentist confidence set for 9, which then will have 
asymptotically correct frequentist coverage probability. The result is stated as following: 

Corollary 6.1. Under the assumptions of Theorem \6.3\ 



inf P Dn {6 G 6(0 M r /Vn ) > 1 - r + o p (l). 

6ee(4> ) 



6.4 Missing data: an example 

We illustrate our method using a missing data example, which was discussed thoroughly 
by Manski (2004). For simplicity of exposition, we present the simplest version. Let Y 
be a binary variable, indicating whether a treatment is succesful (Y = 1) or not (Y = 0). 
However, Y is observed subject to missing. We write M = if Y is missing, and M = 1 
otherwise. Hence we in fact observe (M,MY). The parameter of interest is 9 = P(Y = 1), 
the probability of success. Moreover, we denote the identified parameters 

1 = P(M = 1), fa = P(Y = l\M=i). 

4 The result presented here is understood as: There is a random sequence A(D n ) that depends on D n 
such that A(D n ) = o p (l), and for any sampling sequence D n , we have Pn n (6(0m)~ 9t ^ v/ ™ C 0(</>o) C 
@(0a/ ) <?T ^ v/ ") > 1 — r + A(D„). Similar interpretation applies to (|6.3[) and Corollary 16. II 

5 The asymptotic efficiency based on the support function is achieved by Kaido and Santos (2011, Theorem 
5.4). 
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Let 0o — (010) 02o) be the true values of = (0i,0 2 ) respectively. Then without further 
assumption on P(Y = 1\M = 0), 9 is only partially identified on 0(0o) where 6(0) = 
[0102, 0i02 + 1 — 0i]- The support function is easy to calculate, which is 

5^(1) = 0102 + 1 - 01 S^(-l) = -0102- 

Suppose we observe i.i.d. data {(Mi, FjMj)}j< n , and find that J27=i % = n\ and Y17=i ^iMi — 
n 2 , the number of nonmissing observations and observed success respective. In this example, 
the true likelihood function L(0) oc 0" 1 (1 - 0i) n - ni 02 2 (l - 2 ) ni "" 2 is known. 

We place independent Beta priors Beta(ai, fa) and Beta(a 2 , fa) on (0 1; 2 ). The uniform 
distribution is a special case of Beta prior. Then the posterior of (0i,02) is a product of 
Beta(cni +n>i, fa +n — n\) and Beta(a 2 + n 2 , fa + rii — n 2 ). If in addition, we have subjective 
prior information on 9 and place a prior 7r(0|0) supported on 6(0), then by integrating out 
0, we immediately obtain the marginal posterior of 9. 

We now present the BCS for 6(0) obtained by using the support function of 6(0). First, 
by taking the derivative of p((f>\D n ), we obtain the posterior mode: 4>±m = (n\ + ot.\ — l)/(n + 
Oil + fa — 2), and 2 Af = (n 2 + a 2 - 1)/ (n^ + a 2 + fa - 2). Then 

J ((f)) = y/E max | |0i02 - 01 - 01A/02M + 01m|, 10102 ~ 01M02m|| • 

Let q T be the 1— r quantile of the posterior of J(0), which can be obtained by simulating from 
the Beta distributions. The lower and upper 1 — r level BCS's for 6(0) are Q(<f>M)~ 9T C 
6(0) C 6(0 m ) 9t/v ^ where 

6(0m) _9t/v/ ™ = [01M02M + (It I \fn, 01M02M + 1 ~ 01M " ?r / ' 
6(0A/)' 7t/v/ ™ = [01M02M - q T / Vn, 01M02A/ + 1 ~ 01M + Qr / V^], 

which are also two-sided asymptotic 1 — r frequentist confidence intervals of the true 6(0o). 

Here we present a simple simulated example, where the true 0o = (0.7, 0.5). This implies 
the true identified interval to be [0.35, 0.65] and about thirty percent of the simulated data 
are "missing". Suppose we had no prior knowlege about the true O , and place a uniform 
prior on it. Thus a\ = a 2 = fa = fa = 1. In addition, B — 1, 000 posterior draws {0 l }^i are 
sampled from p((f>x, (f) 2 \D n ) ~Beta(n! + 1, n — ni + 1) xBeta(n 2 + l, 1 + — n 2 ). Then, for each 
of them compute J(0 l ) and set g .05 as the 95% upper quantile of {J(<p l )}f =1 to obtain the 
critical value of BCS and construct the two-sided BCS for the identified set. Each simulation 
is repeated for 1000 times to calculate the coverage frequency of the true identified interval. 
See Table [1] for the results. 

7 From Partial Identification to Point Identification 

We have been focusing on partially identified models. However, results derived for the 
identified set and the support function are still valid when point identification is achieved. 
This is important because in many cases it is possible that we actually have point identi- 
fication and, in that event, 6(0) degenerates to a singleton. For example, in the interval 
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Table 1: Frequentist coverage probability of the true identified interval 



n 


Lower 


Upper 


TwoSided 


50 


0.967 


0.954 


0.927 


100 


0.975 


0.971 


0.954 


500 


0.976 


0.974 


0.953 



Lower, Upper and Two sided represent the frequencies of the events O(0m) qT ^^ C 0(0o); 
9(0 O ) C G(0m) Wv/ ™, and 9{4> M )- qT/VE C 9(0 O ) C 9(0 A /) Wv/ ™ over 1000 replicates. 



censored model, it is possible that EY\ = EY2, in which case 9 = EY is point identified. 

When point identification is indeed achieved, the one-sided coverage 0(0) C 0(0m) 9t ^ 
and 0(0m)~ 9t ^ C 0(0o ) m Theorems 16.21 and I6.3[ and the asymptotic normality for the 
posterior of the support function of Theorem 15.31 still hold because they are generally guar- 
anteed by the semi-parametric Bernstein- von Mises theorem for when 0(0) is a singleton 
(e.g., Rivoirard and Rousseau 2012, Bickel and Kleijn 2011). Theorem 14. II is also guaranteed 
by the concentration theory for the posterior of (Assumption 14. 1| ) . which then implies the 
posterior consistency of the support function, and the same concentration rate as Theorem 

El 

When 9 is identified, {9} = 0(0) = /(0), which is a function of 0, and S^(p) = p T 9. 
Thus the posterior of 9 is the same as the posterior of 0(0), which is completely determined 
by that of under "smoothness" conditions on ^. As a result, Theorem 13.11 is still valid 
because it is implied by Theorem 14.11 which also comes straightforward from the posterior 
consistency of if /(•) is continuous at 0o. Theorem 16.11 however, does not hold anymore 
because when 9 is point identified, its BCS and FCS are asymptotically identical due to 
the Bernstein-von Mises theorem. As a result, the BCS for 9 will have a correct frequentist 
coverage probability asymptotically. 

8 Financial Asset Pricing 
8.1 The model 

Asset pricing models state that the equilibrium price P\ of a financial asset i is equal to 

PI = E[M t+l Pi +1 \X t l i = l,...,N 

where P f l +1 denotes the price of asset i at the period (t + 1), M 4+1 is the stochastic discount 
factor (SDF hereafter) and X t denotes the information set at time t. In vectorial form this 
rewrites as 

l = E[M t+1 R t+1 \l t ] 

where i is the N- dimensional vector of ones and Rt+i is the A^-dimensional vector of gross 
asset returns at time (t + 1): Rt+i = ('"l.t+i, • • • , r N,t+i)' with r it+ i = P^ +1 /P^. This model 
can be reinterpreted as a model of the SDF and may be used to detect the SDFs that are 
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compatible with asset return data. Hansen and Jagannathan (1991) have obtained a lower 
bound on the volatility of SDFs that could be compatible with a given SDF-mean value and 
a given set of asset return data. Therefore, the set of SDFs M t+1 that can price existing 
assets generally form a proper set. 

Let m and E denote, respectively, the vector of unconditional mean returns and covariance 
matrix of returns of the N risky assets, that is, m = E(R t+1 )' and E = E(R t+ i — m)(R t+ i — 
m)' . Denote ft = E(M t+1 ) and o 2 = Var(M t+1 ). We assume that m, E, ft and cr 2 do not vary 
with t. Hansen and Jagannathan (1991) show that the minimum variance cr^(fi) achievable 
by a SDF with mean \i and compatible with the observed (m, E) is given by 

a l(^) = (^ - Ai"i) / S~ 1 (6 - /im) =: 0i/i 2 - 202/i + 03 

with 0! = m'E _1 m, 2 — m'Y 1 ~ 1 L, 03 = i'TT L. (8-1) 

Therefore, an SDF correctly prices an asset only if, for given (m, E), its mean fi and variance 
cr 2 are such that a 2 > c 2 (ft). An SDF's mean and variance (ft, a 2 ) are said to be admissible 
if they satisfy this inequality and we define the set of admissible SDF's means and standard 
deviations as 

e(<P) = {(fi,a 2 )ee;al(fi)-a 2 <0} (8.2) 

where = (01,02,03)' and G C M+ x K + is a compact set that we can choose based for 
instance on some prior knowledge. Usually, we can fix upper bounds ft > and a > as big 
as we want and take 6 = [0, fl) x [0, a 2 ]. In practice, ft and a must be chosen sufficiently large 
such that 0(0) is non-empty. Making inference on 0(0) allows to check whether a family of 
SDF (and then a given utility function) prices a financial asset correctly or not. Frequentist 
inference for this set is carried on in Chernozhukov, Kocatulum and Menzel (2012). 

We develop a Bayesian approach. By using our previous notation we define 9 = (//, a 2 ) 
and 

#(0, 0) = 01^ 2 - 20 2 /i + 03 - O 1 - 

8.2 Support function 

In this case k — 1 and *&(9, 0) is convex in 9. More precisely, ^(9, 0) is linear in a 2 and 
strictly convex in fi (because E positive definite implies that 0i > 0). Thus assumption 15.11 
is verified. Assumption 15.31 (i) is also trivially satisfied. Moreover, G(0) is empty when a 2 < 
0i/2 2 — 202/2+03 for ft G [0, p]. This happens in three cases: either (I) for 01 — 0x03 — 0i(x 2 < 
or (II) for 2 -0!0 3 -0ia 2 > such that ft < g ± Mz^|p^g! or (m) f or > 

such that g ± £szMpM < . Therefore, assumption E31 (ii) is verified for every such 
that (I), (II) and (III) do not hold. This is easily possible by taking ft sufficiently large and 
a 2 not too large. Assumptions 15.31 (iii)-(v) and 15.41 are also satisfied. 

In this example we can make inference on the support function of 0(0) without requiring 
that assumption 15.51 (ii) hold. In fact, assumption 15.51 (i) holds for every e $ and for every 
p E S 2 except for p = (1,0), p = (—1, 0) and p = (0, 1). For these values of p, however, it 
is easy to show that the support function is differentiable at 0o without assumption 15.51 see 
appendix IA. 21 Assumption 15.61 (ii) is trivially satisfied since ||V0\1/(6 I , X ) — V^\I/(#, 02) || = 0, 
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assumption 15.61 (Hi) is satisfied with K = 1 and Assumption 15.61 (iv) is true due to the 
continuity of iff (9, •) in 0. Assumption 15.61 (i) must be checked case by case (that is, for every 
region of values of p) since X(p, 0) takes a different expression in each case, see appendix 

Under assumption 5.1 we can rewrite 



S(p, 0) = argmax{p T 0; iff (9, 0) < 0) 
see L J 

= argmax {p x p, + p 2 <? 2 - X(p, 0)(0i/i 2 - 20 2 /i + 03 - & 2 )} 

= arg max {pi/i + p 2 cx 2 - A(p, 0)(0i,u 2 - 20 2 /i + 03 - cr 2 )} 

0<fJ,<p,, 0<a 2 <a 2 

where p = (pi,P2), A 2 > and A 3 > 0. The support function and S(p, 0) have an explicit 
expression, but is very long and complicated. We present it in Appendix IA.2I 

8.3 Dirichlet process prior 

Let F denote a probability distribution. The Bayesian model is Rt\F ~ F and ip = 
(m, £) = ip(F), where 

ipi(F) = J rF(dr), ^ 2 {F) = J rr T F{dr)- J rF(dr) J rF{dr) T . 

Let us impose a Dirichlet process prior for F, with parameter vq and base probability measure 
F on WL N . By Sethuraman (1994) 's decomposition, the Dirichet process prior induces a 
prior for i/j as: m = J2™=i a^, and £ = j n ; < ; <]' - a S EJLi where f j are 
independently sampled from F ; = ttj ni=i(l ~~ u i) with {mj}" =1 drawn from Beta(l, v ). 
These priors then induce a prior for 0. The posterior distribution for (m, £) can be calculated 
explicitly: 



j=l t=l 



oo n 



:i " 7) 2 a ^ + ^ E U 1 " 7) E + 7 E ^ 

j=i t=i / V j=i t=i 



T 

t 



m\D n ~ (1 - 7) E a ^' +^E^' 7 ~ Beta(l> ), {&}? =1 ~ ^>(1, 1). 
i=i *=i 

We can then simulate the posterior for based on the distributions of £|-D n , m\D n and (18. ip . 
8.4 Simulation 

We present a simple simulated example. The returns R t are assumed to follow a 2-factor 
model: Rt = Aft + Ut + 2t, where A is a iV x 2 matrix of factor loadings. The error terms 
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{uit}i<N,t<n are both cross sectionally and serially independent, and are uniform U[— 2,2]. 
Besides, the components of A are standard normal, and the factors are also uniform U[— 2, 2]. 
The true m = ER t = 2a, £ = AA' + In- It is noted that in our DGP, the true likelihood is 
not Gaussian. 

We set N — 5,n — 200. When calculate the posteriors, the DGP's distributions and 
the factor model structure are treated unknown, and we apply the nonparametric Dirich- 
let Process prior on the CDF of R t — m, with parameter vq = 3, and based measure 
F = N(0,1). We use a uniform prior for (cr 2 ,/i), and obtain the posterior distributions 
for (m, E, 0i, 02, 03, a 2 , fi). More concretely, the prior is assumed to be: 

7r((7 2 ,/i|0) = Tc(cr 2 \(j), /j,)tc(h); cr 2 |0,/i ~ C/[(rJ(/i), ct 2 ], // ~ U[0,p], 

where /i and are a priori independent. We draw 1,000 times from the posterior of (0, a 2 , fi). 
Each time we first draw (m, E) from their marginal posterior distributions, based on which 
obtain the posterior draw of from (j8.ip . In addition, draw \i uniformly from [0,/i], and 
finally a 2 uniformly from [cr 2 (/i), cr 2 ], where cr5(/x) is calculated based on the drawn and \x. 

The posterior mean (0i, 02, 03) of is calculated, based on which we calculate an estimate 
of the boundary of the identified set (we set p, = 1.4 and a 2 = 6): 

Ay = {/2 G [0, p], a 2 G [0, a 2 } : a 2 = 0i/i 2 - 20 2/ u + 3 }. 

In addition, we estimate the support function S^ijp) using either the posterior mean = 
or the posterior mode = 4>m- The theoretical marginal posterior for is hard to compute. 
Thus to calculate the posterior mode, we first estimate the marginal posterior density for 0j 
using kernel smoothing based on the draws {0i}j=x°- The posterior mode 0m is then given 
by the values that maximizes the estimated marginal density. The support function S^p) 
takes value for pi + pi = 1. In Figure [TJ we plot the posterior estimates of the support 
function for two cases: P2 G [0, 1], p\ = a/1 — p|, and p 2 G [— 1, 0], py = — yl — p\. 

Figure 1: Posterior estimates of support function. Left panel is forp 2 £ [0, 1], p\ = a/1 — p\\ 
right panel is for p 2 G [—1, 0], py — — yl — p\ 



Support Function Support Function 




P„ P 2 



Using the approximate posterior model, we calculate the 95% posterior quantile q T for 
J(0), based on which we construct the BCS Q(4>M) qT for the identified set. The boundary 
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of 0(0 M ) <?r//v/ ™ is given by 



A 2 = j/i G [0, /i], a 2 e [0, a 2 } : inf yj\z - fi\ 2 + \a 



l M (z) ~ o 2 \ 2 = q T f \/n 



In Figure [2], we plot the posterior draws of (fi,a 2 ), A±,A 2 and the boundary of the true 
identified set. 

Figure 2: 1,000 posterior draws of (/i,<x 2 ). Solid line is the boundary of the true identified 
set; dashed line represents the estimated boundary using the posterior mean; dotted line 
gives the 95% BCS. 
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9 Conclusion 

We propose a semi-parametric Bayesian procedure for inference about partially identified 
models. Bayesian approaches are appealing in many aspects. Classical Bayesian approach 
in this literature has been assuming a parametric model, by specifying an ad-hoc parametric 
likelihood function. However, econometric models usually only identify a set of moment 
inequalities, and therefore assuming a known likelihood function suffers from the risk of mis- 
specification, and may result in inconsistent estimations of the identified set. On the other 
hand, moment-condition based likelihoods such as the limited information and exponential 
tilted empirical likelihood, though guarantee the consistency, lack of probabilistic interpre- 
tations. Our approach thus only requires a set of moment conditions but still possesses a 
pure Bayesian interpretation. 

Our analysis focuses on identified sets which are closed and convex. These sets are com- 
pletely characterized by their support function, and efficient estimation of support function 
may lead to optimality of estimation and inference of the identified set. By imposing a prior 
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on the support function, we construct its posterior distribution. It is shown that the support 
function for a very general moment inequality model admits a linear expansion, and the 
posterior is consistent. The Bernstein- von Mises theorem is proven. 

Note that in this paper we consider a fixed data generating process (DGP). The con- 
structed BCS has asymptotically correct coverage probability for any specific DGP, and the 
uniformity issue as in Andrews and Soares (2010) is not considered. The semi-parametric pos- 
terior concentration theory has been often developed for a specific DGP even when we have 
point identification, which relies on the existence of certain exponential tests and Schwartz' s 
theorem (see e.g., Wu and Ghosal 2008, Ghosh and Ramamoorthi 2003, Ghosal and van der 
Vaart 2001, Shen and Wasserman 2001). Besides, deriving the asymptotic representation of 
the support function for a fixed DGP is already technically involved. Extending these results 
uniformly in a class of DGP would be a challenging problem. We plan to address this issue 
in the future research. 



A Support functions for two examples 

A.l Support function for the interval regression model 

Consider Example 12.21 We now derive the support function for the identified set. 
Lemma A.l. Suppose <\> 2 X exists, then 

-1/01 + 03 



9(0) 



9eQ:9 



u),u e 



'1 <P3 



Proof. Defline £ = 02#. Then 6 = <p 2 l £. Let u = £ — Then the identified set can be 

written as: 9(0) = {02 ^ : 0i < f < 3 } = {02 +u):cj) 1 <u+ < 3 } . This 

then gives the result. 



□ 



Now we are ready to calculate the support function for 9(0). 
Theorem A.l. Suppose 02 1 exists. The support function for 9 (0) is given by: 



C ( \ T ,-1/01 + 03n . 1703 

Se(<t,){P)=P 02 ( — ~ — ) + «/ 



2 ' p v 2 
where d = dim(6*), sgn(x) = I(x > 0) — I(x < 0), 

( b T 0^ 1 )iS^n(P T 02 



OL, 



\{p T <p2 l )dsgn(p T <p 2 l ) d ) 
Proof. The proof is based on straightfoward calculations. Let A = (0 3 — 0i)/2, then 



Se^ip) = sup p T 9 = p T (f) 2 1 (— 
0eG(</>) 



) + SUp p 02 U. 

* -A<u<A 
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In addition, 



SUp p T 4> 2 1-u 
-A<m<A 



SUP (p T (f> 2 1 )iUi+ Yl (P T 02 1 )^ 

-A<u<A , , 

Y {p T <t>2 1 )A i - Yl (/^ 1 ) l A J , 



(p T ^ 1 ) < >0 



which proves the theorem. 



(A.l) 



□ 



A. 2 Support function for the financial asset pricing model 

Let p G E> d and G denote p = (j»i,P2) and D = (f)\ — 0103. Given the particular 
form that the parameters <f>i, 02 and 03 take in our example and by the Cauchy-Schwarz 
inequality we have: < 0103. This implies: 

D = 02 - 0!0 3 < 0. 

The support function is given by 

St(p)=p T Z( P A), \\P\\ = 1- 

Here 0) is determined by: 
1. for p 2 > 0, pi > 0: 

(p.,a 2 ) if a 2 > 0i/I 2 - 20 2 /2 + 03 

S(p, = < 



<t>2+\/ D + lp!^ 2 _ 2 I 

*-r ,CT Z if cH < <p X ^ - 202^ 



2. for p 2 < 0, pi < 0: 

' (> ' & - f ) , if 02 > 0, and £ < 2 /(0 3 < a 2 ) + ^DT^I(4> 3 > a 2 ) 



S(P, 0) = < 



-1(03 > 03/(03 < ^) + ^(03 > S 2 ) 



otherwise. 



3. for p 2 <0,pi> 0: 

3.1. H(p, 0) = (g - ^ - f + 3 ) , if I and II below are satisfied: 

I. 202 - 20i^i < £i < 202 and 

II. for u = — „ p \ one of the following two conditions is verified: 

r~ < p 1 2p 2 <pi ° 

Il.a D < 0, D + 1( r 2 > and W^^ 2 < ^ < H+yjD+^ 



lib £> = and WfW 2 < M < W^+^ 2 
3.2. 3(p, 0) = (0, 3 /(03 < a 2 ) + ^7(03 > a 2 )) , if £ > 2 2 ; 
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3.3. H(p, <f>) = (/i, </>i/i 2 — 202/i + 03 ) if 2cj)2 — 20i/i > jji and either II. a or II. b above is satisfied for 

3.4. (imaginary solution) S(p, 0) = ( W^+^g ^ if 202-20^ > 2|, £> < and £> + 0io- 2 < 0; 



3.6. E(p,4>) = (/2,ct 2 ) if 2</) 2 - 201/2 > 2i, Z? + 0ict 2 > and either /2 < 02 ^j^ 01 or /2 > 

02+y / -P+ff 2 0i . 

4>i ' 

3.7. (imaginary solution) S(p, 0) = ^ ^g+V^+^i^ ^ ^.2^ ^ j a ]-, ove j s satisfied, D < and £> + 0icr 2 < 
0; 

3.8. E(p,0) = - 2^7^ 2 ) if 1 above is satisfied, D < 0, D + x ct 2 > and cither n < 

02~V-P + 0lg 2 Qr 02 + y / £' + 0lg 2 for 02 _ Pi . 

01 ' 01 ^ 01 2p 2 0l' 



3.9. E(p,0) = — 2 P ^0i ' °" 2 ) ^ ^ above is satisfied, D = and either /i < — — or /i > 



02+V0lff 2 for 02 _ Pi . 

01 " 01 2p 2 01 ' 



4. P 2>0, P1 <0: H(^)^^VW^ 3> , 2)i ,2 



5. p 2 — 0, pi = I: 

(/2, ct 2 ) Vo- 2 € [0i/i 2 - 20 2 ii + 03, ct 2 ] if <J 2 > 0iAl 2 - 202M + 3 

S(p,0) 



02 + VAD + 01CT 2 . o \ ... > > , 

^ ,cH if (7 < (f>lfl — 202M 



6. p 2 = 0, pi = -1: 

Va^ , : ,^] ,[,, 

S(p,0) 



02- V -P+01Q- 2 -2 



02-V-P + 0lff 2 \ . / - 02 + y A P + 



7. p 2 = 1, Pi = 0: E(p,0) = (/1, c ), V> € [max I 0, ^ J ,min ( fj, 

8. pa = -l,pi = 0: 

(ff>^3-f) if 02 >0 and 



6l(T^ 



1^3 - <P 2 



S(p,0) = 



(0, 3 ) if 02 < 



The linearization of the support function given in theorem 15 . 1 1 remains valid despite the 
fact that Assumption 15.51 is not satisfied for three values of p e § 2 , that is, p = (±1,0) and 
p = (0, 1). Denote § ns : {(1, 0), (-1, 0), (0, 1)} C § 2 . For p E § 2 \§„ s the proof of the result 
in theorem 15.11 remains unchanged. 

For p G E> ns the proof is the same as the proof of theorem 15.11 except for the proof of some 
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intermediate results which we now detail. 



Proof of Lemma IG.14t by using the notation in this lemma, we have to show that 



dr- 



dr- 



(A.2) 



We refer to the expressions given in (1G.14|) and (1G.15j) . For p £ S ns then S(p, TO ) is not 
a singleton. However, since V ^ (9 , (f> T ) does not depend on a 2 and since for p = (±1,0), 
E.{p,<f>) is not a singleton only in the dimension of a 2 then, we still get the equality f ]A.2j) . 
For p = (0, 1), X(p, (p T0 ) = so, by using (1G.14I) and (1G.15I) . this implies that the equality 
CO]) still hods. 



Proof of Lemma IG.13t the proof does not change except for the analysis of term 4.2 
in CASE II. Let us start by considering p = (1, 0) which corresponds to case 5 above. If 
"(Pj 0o ) = (fii °" 2 )) Va 2 e ((poifi 2 — 20 O 2/i + 003, o" 2 ], then the constraint is not binding so that 
X(p, 4> ) = and 4.2 = 0. If we are in the other case, then H(p, 0) is a singleton in /1. Due to 
this and to the continuity of ty(9,(fi) in 0, the term [V ^ (9 , (fi ) — (#*(]?), O )] = and 
.4.2 = 0. Proving that A% = for p = (—1, 0) and p = (0, 1) proceeds exactly in the same 
way and then it is omitted. 



B Posterior Concentration for (f> 

Much of the literature on posterior concentration rate for Bayesian nonparametrics relies 
on the notion of entropy cover number, which we now define as follows. Recall that for i.i.d. 
data, the likelihood function can be written as l n ((j),T}) = Y\™ =1 l(Xi] <f),i]), where l(x;(f),r)) 
denotes the density of the sampling distribution. Let 

G = {l(--4>,r ] ):^e^,r ] eV} 

be the family of likelihood functions. We assume V is a metric space with a norm ||.||^, which 
then induces a norm ||.||g on G such that VZ(-; 0, 77) G G, 

\\K-^v)h = U\\ + \\v\W 

In the examples of intervel censoring data and interval regression, l(x; 4>,rj) = rj(x — <fi) and 
WvWri = \\v\\i = J \v( x )\dx. Then in this case <fi, 7})\\g = \\<f>\\ + IMIi- Let B(l,p) denote a 
closed ball in G centered at I e G with radius p. 

Define the entropy cover number Af(p, G, ||.||g) to be the minimum number of balls with 
radius p needed to cover G. The importance of the entropy cover number on nonparamet- 
ric Bayesian asymptotics has been realized for a long history. We refer the audience to 
Tihkomirov (1961) and van der Vaart and Wellner (1996) for good early references. 

We first present the assumptions that are sufficient to derive the posterior concentration 
rate for the point identified <p. The first one is placed on the entropy cover number. 
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Assumption B.l. Suppose for all n large enough, 



Nin-^^lognf/^G, \\.\\ G ) <n. 



This condition requires that the "model" G be not too big. Once condition holds, then 
for all r n > n~ 1 ^ 2 (logn) 1 / 2 , Af(r n ,G, \\-\\g) < ex v( nr n)- Morever, it ensures the existence of 
certain tests as given in Lemma below, and hence it can be replaced by the test condition that 
are commonly used in the literature of posterior concentration rate, i.e., Jiang (2007), Ghosh 
and Ramamoorthi (2003). Same condition has been imposed by Ghosal et al. (2000) when 
considering Hellinger rates, and Bickel and Kleijn (2012) when considering semi-parametric 
posterior asymptotic normality, among others. When r] Q belongs to the family of location 
mixtures, this condition was verified by Ghosal et al. (1999, Theorem 3.1). 

The next assumption places conditions on the prior for ((f), t]). For each ((f), f]), define 



V^n = var 



E 



log 



log 



l(X;(f>o,r) ) 



4>o,Vo 



log 



l(x;(j),r)) 



l(x; (f) ,r] )dx 



4>o,Vo 



Assumption B. 2. The prior it ((f), rj) satisfies: 



2 (Ifa <Po,Vo)'\ ,, , , T -2 

lo § ( TTZ. 1 - ) K x '> ^o, Vo)dx - K^. 



l(x;(f),r)) 



7T I K^ v < 



logn 



n 



logn\ M 

n — > oo 

n 



for some M > 2. 



77) is close to (0o-^o)> both and are close to zero. Hence this 



Intuitively, when 

assumption requires the prior have sufficient amount of support around the true point iden- 
tified parameters in terms of the Kullback-Leibler distance. Such a prior condition through 
the Kullback-Leibler neighborhood as max{^ fl , < has also been commonly im- 
posed in the literature of semi-parametric posterior concentration, e.g., Ghosal et al. (1999 
(2.10), 2000 Condition 2.4), Shen and Wasserman (2001, Theorem 2) and Bickel and Kleijn 
(2012, (3.13)). Moreover, it has been verified in the literature that the sieve prior (Shen and 
Wasserman 2001), Dirichlet mixture prior (Ghosal et al. 1999) and Normal mixture prior 
(Ghosal and van der Vaart 2007). 

We are now ready to present the posterior concentration rate for 0. 

Theorem B.l. Suppose the data X\, X n are i.i.d. Under Assumptions \B . 1\ and W.B. for 

some C > 0, 

P(U~M < Cn- l / 2 (\ogn) l l 2 \D n ) 1. 

The proof of this theorem requires two technical lemmas. The first is taken from Shen 
and Wasserman (2001). 

Lemma B.l. Under Assumption \B.B. 



P, 



Z»(0,r7) ^ ^ < logra/ra, < logn/n) I — 1 . 
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Proof. The proof follows the same argument of that of Lemma 1 in Shen and Wasserman 
(2001), and hence is omitted. □ 

The following lemma is regarding the existence of an exponential test, which is essential in 
establishing the posterior consistency and concentration rate in the nonparametric Bayesian 
literature. The idea of using the exponential test for posterior consistency dates by at least 
to Schwartz (1965). 

For a function of the data T(D n ), define 

E ( f >t r j T(D n ) — E[T(D n )\(f),r}] — J T{x)l n {x;<f>,r])dx. 

Lemma B.2. Under Assumption \B.1\ there exists a test T and a constant L > 4 and 
L > M + 2 (for M defined in Assumption \B. || ) such that 



(i) 

(ii) for r n = y/ (log n)/n, 

sup ^,17(1 -T)< exp ( -^-L 2 nr 2 n ) . 
■n<sP,\\<t>-<t>o\\>Lr n V lb J 

Proof. For any natural number j, and some L > 0, define 

Hj = {/(., 0,77) 6G:i)6 V,jLr n < ||0 O - <j)\\ < (j + l)Lr n }. 

We cover Hj using Nj balls like: B(g, r) = {I G G : — g\\o < r} for some small r = 4 _1 jLr n 
and center g G G. Then the Hj can be covered by Nj (to be characterized later) balls like 
B(g h A~ 1 jLr n ), with centers g n , ...,gj,Nj G Hf. 

HjCuf^Big^i-'jLrn). 

Let those centers be chosen such that Nj is the minimum number to make such a cover. 
Let l = Z(.,0 o ,?7o). Then for any ball _B(g,,j, 4 _1 jLr n ), the center satisfies \\gji — 1 \\q > 
\\4>ji ~~ 0o|| 2 > j 2 £ 2r n- The last inequality follows since gr^ G Ifj. Then for any /(.,0, 77) G 
B(g ji ,4:- 1 jLr n ), \\l-l \\ G > \\lo — 9ji\\G- ¥ ~ 9ji\\c > jLr n -A~ l jLr n = \jLr n . So we have 
shown that each element in the small ball B(gj h 4~ 1 jLr n ) is 3jLr n /4 away from l , and due 
to the convexity of such a ball, by the standard minimax result (see Le Cam, 1986, Birge, 
1983), there exists a test Tji such that 



max \ Efo^Tji, sup E M (l-T)\ < exp(-nd(Z , B(g jh 4 jLr n )Y 

leB(g 3l A-^jLr n ) J 

^ ( ^ -2 r 2 2 

< exp I -n—j L r n 

where d(l ,B) = inL eB \\l — Iq\\g, and we have shown that d(l , B(gj iy A~ x jhr^)) > \jLr n . 
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Now define 



T = sup max Tj,-. 



Then for any I = <fr, rj) such that ||0 — 0o|| > Lr n , exists j*, so that I G iJ,*. By the cover, 
there exists i* < Nj*, and a ball so that / G B(gj*i*,4~ 1 j*Lr n ). Due to — T < — T^- for any 
-T) < sup fleB(ffJ . i , i4 -i i . Lrn) ^(l-r) < exp(-n^j* 2 L 2 r 2 n ) < exp(-n^L 2 r 2 n ). 

Hence 

^ -2„2 



sup £^/(l - T) < exp -n— L r n 

U-<j)\\>Lr n ,r,eV \ 10 



This proves the second assertion (type II error) of the lemma. 

For the first assertion (type I error), E^^T < J2j>i E^jv, ET^ < £\ N j ex P(-n^j 2 L 2 rl] 
Note that Nj = J\f(4r l jLr n , Hj, \\.\\ G ) < N(4r l jLr n , G, \\.\\ G ) < M{r n ,G, \\.\\ G ) < exp(nr£), 
where we used L > 4 so 4~ 1 jL > 1, and the number of covers should be bigger if the radius 
is smaller. Hence 



E^T < ^Njexp (^-n-^fL 2 ^ < exp(m^) J^exp 



-^y/^l ) = (i) 



This is o(l) since L > 4 and nr 2 , — > oo. □ 

Proof of Theorem IB. II 

Proof. Let E = Ef ^ be expectation operator with respect to the distribution of data, 
given the true parameters. For some M > 0, let U denote the ball centered at </> with radius 
MwT 1 ! 2 (log n) 1 / 2 . Denote U c as the complement of U. Then It suffices to show that for some 
M > 0, EP((f) G U c \D n ) = o(l). In fact, for the test T in Lemma MM 

EP((f> G U c \D n ) = E[P(<P G U c \D n )T\ + E[P((f> G U c \D n )(l - T)} 

<ET + EP{<j> G U c \D n )(l — T) = o(l) + EP((P G U c \D n )(l - T). 
The last equality follows from Lemma IB.2( i). Let 

r ( xr ^ lo S n t/ / logn^ 1 



n n 



In 2 ' 



and define an event 

A = ff ! n }? ,v) M <t>ivWdv>Pn- 



Then Lemma IB~T1 shows that -Pd„(A) -» 1- In addition, exp(-Lnr^)/3~ 1 = o(l) for L > M+2 
in Lemma IB.2| by Assumption IB. 21 Then 

E[P(<f> G U c \D n )(l - T)] = E[P(<f> G U c \D n )(l - T)I A \ + E[P(<f> G U c \D n )(l - T)I A c\ 

< EP{<P G U c \D n )(l - T)I A + 2EI A c < EP(4> G U c \D n ){\ - T)I A + o(l). 
The last equality follows from Pu n (A) — > 1. 
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It remains to upper bound EP(<p G U c \D n ){\ — T)Ia- We need to lower bound the 
denominator of the posterior probability, and upper bound the numerator as well. Because 
of I a, the lower bound of denominator can be realized on A. Then 

E[P(<peU c \D n )(l-T)lA}<^E\ [[ fl ^'f^M d^d^l-T) 



= I Six v* v n i{x?t t>) (1 ~ T) n ^ mMdn, d<t>)dx x ...dx n 

= & ((I Y[l(X i ; ( p,r ] )(l-T)7i(dr ] ,d ( p)dX 1 ...dX n 

J J JXxU c xV i=1 

Here I used the fact that EV = E^^V = J V Y\i K x u 4>0i r/ )dxi...dx n . Also note that 
J V Yii l(Xi, 4>, rf)dxi...dx n = E^V . Using the Fubini's theorem by changing the integration 
order, we have the expression above also equals: 

& HI f\l{X i -cl ) ,r 1 ){l-T)'K{df,d ( l ) )dX l ...dX n = p- 1 If Et„(l-T)Tr{dTiM) 

J J J XxU c xV i=1 J JU c xV 

< p- 1 ^ G U c ) sup E^{l-T)<e W {-Lnr 2 n )(3- l = o{\). 

C Proofs for Section 3 

C . 1 Proof of Theorem ETI1 

In this proof we use the notation ie to denote: e if i — 1 and — e if i — — 1. We start by 
stating and proving the following lemma. 

Lemma C.l. Let -k{9\4>) be a regular conditional distribution. Under assumption \3.2\ there 
exists a sequence of bounded and continuous functions {h m ^((f))} m defined on $ for i G 
{—1,1} such that \h mtL ((j))\ < 1 and 

n(6 G e(0 o ) te |0) = lim h m M, tt(0) - a.s. 

m— >oo 

for IE {-1,1}. 

Proof. Denote by C(<&) the set of continuous function on $. Since n(9\(f>) is a regular con- 
ditional distribution then there exists a transition probability from ($, 23^) to (9, *B#) that 
characterizes it, where and QSe denote the cr-fields associated with $ and 9, respectively. 
This means that n(9 G 9(0o) te |0) is a measurable function of <fi for i G { — 1, 1}. 

Next, remark that if ^ A €il then 9(0o) t<E is n °t supported by the conditional prior 
distribution 7r(9\ 4>). The r efore, tt(9 G 9(0 o ) te |0) = 0, V0 <^ A €ti . It follows by the Lusin's 
theorem (see e.g. iRudin I (11989 1 page 55) that, on a compact set K L C ($ 23^) of almost full 



7r(0)-probability, 7r(# G 0(0o) te |0) is equal to a continuous fu nction o f 4>- F inally, since for 



any G $, |7r(0 G 9(0o) te |0)| < 1, by the corollary page 56 in IRudin I (119861 ). there exists a 
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sequence h m>i G C($), \h mjL \ < 1 such that ir(9 G 0(0 o ) te |0) = km m .+ 00 h mjk (<j)), 7r(0)-a.s. 

□ 

Under assumption 13.21 and by lemma IC.ll we can apply the Dominated Convergence 
Theorem so that linim^oo /i mjt (0) = n(9 G Q((poY e \(j)), 7r(0)-a.s. implies 

lim / h m:L (4>)iT((p)d(j) = / lim 7t m ,(0)7r(</>)#. 

We consider P(0 G 0(0o) te |-Dn) and show that it converges to 1 for l = 1 and to something 
smaller than 1 for i = — 1. Under assumption 13.21 and by lemma £TTJ this probability can be 
rewritten as 



P{6eQ{<P,Y\D n ) = Tr(9ee((f> ) K \(f>)ir(<l>\D n )d<f>= lim h m>t (0)7r(0|D n ) # 

lim / h m M^\ D n)d(/) (c.i; 



m— »oo 



We analyse separately the case with a nonparametric prior and the case with a semi- 
parametric prior. 

Nonparametric prior. In this case, assumption 13.11 (i) holds. The expression in (jC.lft 
must be developed further: 

P(6 G e^lDn) = lim / h m ^<f)(F))n(F\D n )dF. 

Therefore, since is a continuous function of F (by the first part of assumption 13.11 (i)) 
we have that the composed function h mtL o cj> is a continuous and bounded function of F and 
under the second part of assumption 13.11 (i) we obtain 

lim P{9 G 0(0 o )"|^n) = lim lim f h„ l:l ((f)(F))-n:(F\D n )dF 

n— >oo n— >oo m— >oo 

= lim / h m M F )) I™ AF\Dn)dF = lim / h m ^(j>{F))8 FQ {dF) 
= lim h m , b {(j){F )) = n(9 G 9(0o) te |0(F o )) 

m— >oo 

where 5p denotes the Dirac mass in F . Since 7r(0|0(F o )) has support equal to 0(0o) an d 
0(0o) e C 0(0o) C 0(0o) e then by using assumption 13.31 

lim Fid G 0(0 o ) e |£>n) = 1 and lim P(0 G 0(0 o ) _e \D n ) < 1, P - a.s. 

n— >oo n— >oo 

Semi-parametric prior. In this case, assumption 13.11 (wj holds. Since h m ^{-) is a 
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continuous function of 0, then equation ACID and assumption 13.11 (ii) imply 
lim P{9 G Q(<Pv) Le \D n ) = lim lim / /v t (0)7r(0| AOd0 

n— >oo n->oo m->oo 

= lim / /i m/ (0) lim 7r(0|L> n )d0 = lim / h m ,{(j))8^{d(j)) 

m— >oo J q n— >oo m—>oo J ^ 

lim h m> ,{<t> ) = n(9 G 9(0 o )i0o) 



where <5^ denotes the Dirac mass in O . Since n(9\4>o) has support equal to O(0o) then by 
using assumption 13.31 

lim P(9 G e(0o) e |£>n) = 1 and lim P(0 G 0(0o) _e | J D„,) < 1, F - a.s. 

n—>oo n—too 

This concludes the proof. 

D Proofs for Section 4 

D.l Proof of Theorem 14.11 

Define Q(9, 0) = || max(tt(0, 0), 0)|| = [E- =1 (max(^(^ 0), 0)) 2 ] ^ . 
Lemma D.l. There exists C > 0, for any 0i, 2 G 

sup |Q((9, 0x) - Q(0, 2 )| < C\\<j> x - 2 ||. 
see 

Proof. Define /(x) = > 0), where x G R. It is straightforward to show that Vxi,x 2 , 
\f(xi) — /(x 2 )| < \x\ — x<i\- One the other hand, for any 0i,0 2 G $, 

|Q(0, 00 - Q(9, 2 )| = | || max(^(0, 00, 0) || - || max(*(0, 2 ), 0) || | 
< || max(tf (0, 00, 0) - max(^(0, 00, 0)|| 

d \ 1/2 

Qmax(^(0, 00, 0) - max(^(0, 00, 0) 12 

0O) - /(*<(*, 0O)] 2 J < ( fo) - 2 )] 2 

= ||*^0O 2 )|| <C||0 1 -0 2 || ' ' (D.l) 

where C does not depend on 0, by Assumption 14.31 

□ 

Lemma D.2. There exists a closed neighborhood £7(0o), for any a n = 0(1), there exists 
K > that does not depend on 0, so that 

inf inf max ^A9, 0) > a n . 

<t>&U(<j> ) d(8,e(<f>))>Ka n i<k 
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Proof. For any C > 0, define A c = {0 G U(<f> ) : vaie:d{e,e{<t>))>Ca n max;< fc ^(0, 0) > a n }. 
Then by Assumption I4.4[ V0 G U((f) ), there exists > so that G A<^. Thus, we 
have U((f>o) C U^g^^Ac 1 , ■ Since U(4>q) is a closed neighborhood inside R dim W, which is 
compact, hence there exist constants C\, ...,Cn for some finite N > to form a finite cover 
so that U(<f>o) C IJili Then V0 G U((f>o), there exists j < N so that G A^., which is 
mi e . di o :eW) > Cjan max i < d ^ i (9,4>) > a n . On the other hand, let K = max{Cj : i < N}, then 

inf max tyA6,6) > inf max\I/j(0,0) > a n . 

e:d{8,e(if>))>Kan i<k 0:d(e,0(<j>))>Cja n i<k 

This is true for any G t/(0o)- Hence the result follows. □ 
Lemma D.3. For any M > 0, there exists 5 > 0, and a neighborhood U(<po) of O , so that 



inf inf Q(0,4>)> M\l 1 ^. 

n)/n ' 

Proof. For any M > 0, by Lemma ID. 2 [ there exist U(4>q) and 5 > so that 



/ lo£f n 

inf inf max^A, 0) > M\ — ^— . (D.2) 

0et/(0 o ) d(e,e(0))>5 v /iogn/n ^ d V n 



Now for any (0,0) G {(0,0) G 6 x C/(0 O ) : (2(0,9(0)) > c^log^M, since £ 9(0), 
maxj< fc \l/j(0, 0) > 0, which implies that maxj< fc \l/i(0, 0) = max,< fc ^i(9, 0)1(^(0, 0) > 0). 
For notational simplicity, let = \I/j(0,0), and ^ = (^i, ^k) T - Then using the fact 



that m&XiAf = (maxjAj) 2 if A{ > 0, we have, 



!/ 2 ^ . 1/2 



Q(0,0) = ||max(*,0)|| = ^^[max(^, 0)] 2 j > ^max[max(^, 0)] 2 ^ 

\ 1/2 

maxmax(*i,0)] 2 ) = maxmax(\l> i , 0) = max \I>;/(\I>i > 0) = max ^(0, 0). 

i<k J i<k i<k i<k 

The result follows immediately from ( ID.2[) . □ 
To simplify our notation, let us define 



logn 

n 

Lemma D.4. There exists a constant C > so that 



P(9(0) c 9(0 o ) Cr "|AO ^ P 1, 

where 9(0o) CVn = {9 G 9 : d(0, 9(0o)) < Cr n }, and P(.\D n ) denotes the marginal posterior 
probability of 0. 



47 



Proof. For any C > 0, let 0\0(0 O 
that there exists C > so that 



|Cr„ 



{# G 6 : c?(0, O(0o)) > Cr n }. Suppose it is true 



( 



P I inf Q(B,4>) > sup Q(6>, . 
V e&e\e(4> ) Cr " e<=e(i>) 



D„ 1, 



(D.3) 



then the lemma holds, this is because on the event inf0 e e\e(<fo) cv « 0) > su P0ee(</>) Q{®-> 4>)i 
we have 0(0) C 6(0 o ) CVn - Thus it suffices to show (ID.3[) . Note that sup ege ^ Q(9,<p) = 0, 
since V6 1 G 6(0), (?(#, 0) < 0, which is equivalent to Q(9, 0) = 0. On the other hand, we have 
P(\\</> — 4>o\\ < r n \D n ) — > p 1. Therefore, it remains to show 



Pi inf Q(#,0)>O 

K e&e\e(4>o) Cr " 



D n -> p 1. 



(D.4) 



In fact, for any so that 



< r n , by Lemma fD.lt there exists K > 0, for any C > 0, 



inf Q(M)> inf r Q(Mo)-sup|Q(#,0)-Q(Mo)| 

> inf Q(#,0o)~^r n . (D.5) 

0ee\e(^o) Cr ™ 

Now by Lemma IP.3| there exists C > such that 

inf Q(Mo)= mf Q(#,0o) > 3Kr n . 

6>ee\e(</> ) cv ™ d(6»,e(</. ))>cr„ 

Hence we have shown that whenever ||0 — O || < r n , inf e e\e(0 o ) Cr " Q(#,0) > 2Kr n > 0. 
Therefore, by the posterior concentration for 0, ( 1D.4j) holds from 



PI inf Q(0,0)>O 
\6»ee\e(</-o) Cr ™ 



AJ >P(\ 



^oll <r„|i) n ) 



□ 



Lemma D.5. There exists L > so that P(0(0 O ) C Q{<f)) Lr "\D n ) -> p 1. 
Proof. By Lemma [P.ll there exists K > so that whenever ||0 — 0o|| < r n , 

sup|g(e,0)-g(0,0o)| <i^r n . 

see 

We now fix such a 0, then for all large enough n, G [7 (0o) where [/ (0o) is the neighborhood 
defined in Lemma TP .31 For such a i^, by Lemma TP. 3 1 there exists L > that does not depend 
on (since the following inequality holds uniformly for G t/(0o) by Lemma IP. 3 1) . 



inf Q(9, 0) > Kr n , 

d{0, &{<!>))> Lr n 

which then implies that {8 : Q(9,<p) < Kr n } C {6 : d(0,O(0)) < Lr n }. On the other hand, 
for any 9 G 0(0 O ), Q(Q,(/>o) = 0, which implies Q{6,4>) < + |Q(#,0) - Q(#,0o)| < Kr n . 
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Therefore, 6(0 O ) C {9 : Q{0,<f>) < Kr n } C {0 : d{0, &(</>)) < Lr n }. Hence we have in fact 
shown that, the event ||0 — O || < r n implies the event 6(0 O ) C 6(0) Lr ". Moreover, the event 
110 — 0o|| < r n occurs with probability approaching one under the posterior distribution of 
0, which then implies the result. □ 

Lemma D.6. For two sets A,B, if A C B Tl and B C A r2 /or some r\,ri, i/ien 

d H (A,B) < max{ri,r 2 }. 

Proof. dn(A,B) = max{sup agj4 d(a, -B), sup fegS d(6, A)}. Then Va G A, since A C S n , a G 
5 ri , which is d(a,B) < r\. This simplies sup ag ^ d(a, B) < r\. Similarly we can show 
sup 6eB d(b, A) < r 2 . □ 

Proof of Theorem 14.11 

Theorem 14. II follows from combining Lemmas ID.4HD.6l Q.E.D. 



E Proofs for Section 5 

Lemma E.l. If ^/(9,<f)o) contains a subvector of strictly convex functions ^s{-,<Po) of 9, 
then the function 9 i— >■ ^s{9, 0) is strictly convex for all G -B(0o, S). 

Proof. We fix a constant 5 > to be determined later. Then for any G B((p , 5), A G [0, 1] 
and 0i, 9 2 G 9, we want to show # s (0iA + (1 - X)0 2 , 0) < Ws(9i, 0) + (1 - A)^ 5 (0 2 , 0)- In 
fact, since ^s{9, 0o) is strictly convex in 9, there is e > such that ^s(9iX+ (1 — A)0 2 , 0o) < 
X^/s(9i, 0o) + (1 — A) l I / s'(0 2 , O ) — e . Then by the continuity of — > ^s{9, 0) at O , there is 
5 > such that whenever ||0 — O || < 5, we have 

O ) < 0) + e /3, ^ 5 (0 2 , 0o) < ^s(0 2 , 0) + eo/3 

and ^ 5 (M + (1 - X)9 2 , 0) < ^s(9i\ + (1 - A)0 2 , O ) + e /3. Therefore, 

^ 5 (M + (1 - A)0 2 , 0) < + (1 - A)0 2 , O ) + e /3 

< A* s (0i, O ) + (1 - A)^ 5 (0 2 , O ) - e + e /3 < A^ 5 (0 1; 0) + (1 - A)^ 5 (0 2 , 0). 

□ 



E.l Proof of Theorem 15.11 

For any r G [0, 1] and any 1; 2 G B((p , r n ), define T = rcfii + (1 — r)0 2 with 2 = T | r= o 
and 0i = r | T=1 . For every p G S d the support function may be rewritten as a function of 
r: S^.Jp) : [0,1] — > R. By lemma Kj.141 the support function S^^p) is differentiable at 
r = t G (0, 1) and then we can apply the mean value theorem to 5^ T (p): 

(E.l) 

r=r 6(0,l) 



9 
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By defining r : S d — > (0, 1) a measurable and different iable function of p and by using the 
result of Lemma 10. 141 we obtain 



d_ 



Hp, Ml -fa 



(E.2) 



for some 6{p) G E(p, 4> To (p)), P G §> d . By plugging (1E.2[) in (IE. II) and developing further we 
obtain 



Sfrip) - Sfcip) = A(p, ro( p)) T V0^(0(p), 0r o (p))[0i - 02] 

= A(p,0 o ) T V^(p),0o)[0i-0 2 ] (E.3) 
+ (a(p,<^ o(p) ) t V^(%),0 To(p) ) - A(p,0 o ) T V^(0,(p),0o)) [0i - 2 ] 



where 9* : E> d — > B is a Borel measurable mapping satisfying 0*(p) G O )- Denote 



2 ) = sup ( \(p, <p To(p) y v^(6( P ), ro(p) ) - \(p, </> y v^(6/*(p), <&>) ) [01 - 02]. 



By lemma IG.151 



tt<h,<h) 
\\<h-<h\\ 



converges to uniformly in 2 ) G £>(0o, ?"n) as r n — )■ 0. 



Finally, we analyze the first term in IE. 31 By lemma IG.9I and lemma IG.10| the function 
p i — y X(p, <ft Q ) is continuous in p G S d and therefore it attains its supremum. Moreover, 
sup pg gd V</,\I/ (f> ) < sup ege V</,\l/(#, O ) an d the supremum is attained since, under as- 
sumptions and EH (i), !->■ V ^ (9 , cj) ) is uniformly continuous on 9. We have 



sup \ (S^(p) - Sfc(p)) - A(p, 0o) T V ^(6', t (p),0o)[0i - 2 ] 

pes d 



sup 



A(P, 0r () (p)) V^(0(p), To(p) ) " A(p, 00)' V^(0*(p), O ) [01-0 



=: /(0i,0 2 ). 



(E.4) 



(E.5) 



E.2 Proof of Theorem ET21 

Proof. Denote r n = (logn) 1 / 2 ^ -1 / 2 and f2 = {</> G 5(</>o, r n )}. Under assumption 14.11 
P(Q c \D n ) = o p (l). Then, by using the expansion of the support function given in lemma IBTTl 
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we have 



p(su P \sm - SM\ > Cr n D n ) = p({su P ism - SMI > Cr 4 n n 

+ p(sup \sm - S M\ > Cr n n fi c £>„) 

< pfsup \sm - S M\ >Cr n nn D n ) + P(n c \D n ) 



D, 



<p(/(0i,02) + sup|A(p,^o)'V^(^(p),0 o )[0-0o]|>Cr n n n 
v pes d 

< P(o(||0 - O ||) + sup |A(p, o )'V^(0*(p), 0o)|| 110 - 0o|| > Cr n 
which converges to in probability under assumption 14.11 
E.3 Proof of Theorem Q 



D n ) + o p (l) 



D n )P(n\D n ) + o p (l) 

□ 



Proof. Denote r n = (}ogn) 1/2 n 1/2 , Q := {(f> e P(0 o ,r n )} and h n := y/n sup pe § d (SM 
S<fo(p))- Since the Total Variation distance is bounded by 2 we have: 

EUP^i^-^A^, T^)\\ TV = E\\P hnlDn -M(A n ^J^)\\ TV I n 
+ ^\\Ph n \D n - X(A n ^ , Ifo)\\TvIsi° 
< E\\P hnlDn -Af(A n ^ J- o 1 )\\ TV I n + 2P(n c ). 



By the expansion of the support function given in lemma IG. 161 the element h n is asymp- 
totically equal to y/n\X(p, 0o) / V^,^ r (0*(p), 0o)[0 — 0o]- Moreover, under assumption 14.11 
P(tt c \D n ) = 0p (l). Therefore, E\\P hn \ Dn - N{A n ^J^)\\ T v equals 



EHP^sup^gd [A(p,Wv^ff(fl.(p),^o)[*-^,]||u„ -^Kai )llrWn + o(l) + o p (l) 
which converges to under assumption 15.71 



□ 



F Proofs for Section 6 



F.l Proof of Theorem I6TT1 
Lemma F.l. For any consistent estimator 
Proof. Straightforward calculation shows 



0o|| = o p (l) ; P(^6(0)|P n ) = o p (l) 



p{p i e((f>)\D n ) = j n(e$ e(0)|0M0|p n )^ < J <H e(<h)\4)p{<(>\D n )d<i> 

+ / |tt(0 i 0(0)|0) - n(6 £ e(0 o )|0)b(0|P„) C /0 = A + B. 



51 



We investigate A and B respectively. First of all, by the posterior concentration of 0, there 
is r n = o(l) such that (note that 7r(0|0) = ^{0\(p)h&e{<t>)) 

A = f 7r(0£9(0 o )|0M0|AOd0 + o P (l) 

= / / T:{e\(f)dep{4>\D n )d4> + o p {i). 

JU-4> \\<r n Jete(M,eee(<p) 

By the expansion of the support function, \\<f> — O || < r n implies <i#(©(0), O(0o)) < Cr n for 
some C > 0. Hence 6(0) C 6(0o) Cr ™, which yields 

A< f [ Ti(6\<P)de P (<P\D n )d<p + o p (l) = o p (l), 

where the last equality follows since r n = o(l) and sup e ^ 7r(0|0) < 00. On the other hand, 
let H = {6e 6(0 O ), 9 t ©(<£)} U G 0(0), 9 £ 6(0 O )}. Then 

sup |tt(0 £ 6(0 O )|0) - tt(# £ 0(0o)|0)| < sup7r(0 G |0). 

Due to the consistency of 0, and the expansion of the support function, 

<itf(O(0o), 9(0)) = o p (l), and therefore fi(H) = o p (l) which implies sup^ g$ 7r(H\<p) = o p (l) 

since sup e ^ vr(6 l |0) < 00. We conclude that B = o p (l). □ 

Now to finish proving part (i) of the theorem, noting that 0(0) C FCS(r), we have 

P{9 i FCS(T)\D n ) < P(6 i e(4>)\D n ) = o p (l), 

which by definition, leads to the conclusion of part (i). 

For part (ii), by the definition of BCS that P{9 G BCS(r)|.D n ) = 1 — r, we have 

P{9 G FCS(r),9 i BCS(r)|D n ) < P{9 $ BCS(r)|D„) = r. 

On the other hand, Lemma EH implies P(9 £ 6(0) U9 G BCS(t)|AJ < o p (l) + 1 - r. Hence 

P{9 G FCS(r),# i BCS(r)|D„) > P(9 G 9(0), 9 $ BCS(r)|£>„) 
>1-(o p (1) + 1-t) = t + o p (1). 



F.2 Proof of Theorems [O] and ET3 



Theorem 16.21 has been proved in the main text. We now prove Theorem 16.31 
Lemma F.2. Suppose that assumptions 15. ill5. 6\ hold with 5 = r n . Then, for any x > 0, 

p (Vn sup \S^p) - Si(p)\ < x\D n ) - P Dn (Vn sup | - Si (p)\ < x) = o p (l). 

Ibll=i lbll=i 
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Proof. For 9*(p) and A(p, 0o) defined in Lemma [G. 161 define 

4 n (01, 02) = V^A( P , O ) T VvI/(^(p), O )(01 - 02) 

where 6**(p) and X(p, O ) do not depend on specific choice of 0i and 2 . Then Lemma [0. 161 
implies that 

sup sup \yfc{SM - SM) - / p n (0i,0 2 )| = o(l). (F.l) 

<t>i,<t>2£B{<t> ,r n ) \\p\\=\ 

For notational simplicity, we further write 5Vi(0i,02) = \fn supii p i| =1 \S$ x {p) — S<p 2 (p)\. Then 

|P(5f n (0,0 M ) < ar|D n ) - Pd„(^(0o,0m) < a?)| 

< |P(sup |/;(0,0m)| < -Pd„(su P |/ p "(0 o , 0m)| < *)| 

lbll=i lbll=i 

+|P(<? n (0,0 M ) <x|£> n )-P(sup |L n (0,0 A /)| <x|P n )| 

INK 

+|Pd„( sup |/ n (0 o , A /)| < x) - P Dn {g n {<f)o, 0m) < a:) | 
lbll=i 

= ai + a 2 + a 3 

It remains to show that cij = o p (l) for i = 1,2,3. By the Bernstein von Mises theorem 
and asymptotic normality of 0m (Assumption 16. 2p . the posterior y/n((p — cf)M)\D n and the 
sampling distribution y / n(0M — 0o) ar e asymptotically identically distributed. This implies 
that a\ = o p (l). On the other hand, (1F.1|) implies 

su P0i,<feeB(4>o,r„) l#n(0i, 02) - suP|| P ||=i /p (0i> 02)1 = o(l), where we used the inequality 
| sup;,, gi(x) — sup^. gz(x)\ < 3sup x \gi(x) — g^ (x) | for any wo functions gi(x) and #2 (a?)- There- 
fore, if we write A = \g n {<j>,<t>M) ~ su P||p||=i /p(0,0m)|, 

a 2 < \P(g n (<P,(i)M) < x\D n ) - P(g n ((p,(f) M ) < x + A\D n )\ 

+ |P(# n (0,0 M ) < x\D n ) - P(g n (4>, M ) <x- A|D n )| = o p (l) 

since A = o p (l). Similarly, a 3 = o p (l). □ 
Proof of Theorem 16.31 

Pd„(6(0m)-^ /v ^ C 0(0 o ) C 0(0m) Wv ^) = Pd„(v^ sup |^ (p) - S $ Jp)\ < q T ) 
> P(J(<p)<q T \D n )+o p (l) = l-T + o p (l), 

where the inequality follows from Lemma [F. 21 
Proof of Corollary 16.11 

Proof. For any fixed 9 £ O(0o), 

P Dn (0 e 0(0 M )^) > PD n (e(0 o ) c 6(0 M ) 9V ^) > 1 - r + o„(l) 
where o p (l) is uniformly in 9 £ O(0o). This gives the result. □ 
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G Technical Lemmas 



We remind some technical notation that will be used throughout this section. 

- d = dim(0), = dim(0); 

- ^fs(9,<ft) is the /cg-subvector of x I / (0, 0) containing the constraints that are strictly convex 
functions of and Ag(p, 0) are the corresponding Lagrange multipliers for p G § rf ; 

- \I/l(0,0) is the /c^-subvector of ^(9, 0) containing the constraints that are linear in 9 and 
Xl(p, 0) are the corresponding Lagrange multipliers for p G S d ; 

- E(p,(f>) = argmaxe e e{p T ^; ^(0,0) < 0} is the support set of 0(0); 

- V<£\I/(0, 0) the k x d^ matrix of partial derivatives of ^ with respect to 0; 

- V0 G 0(0) and G B(<f> ,r n ), we denote by Act(0,4>) := {i; $i(0,0) = 0} the set of the 
inequality active constraint indices and by 6^(0, 0) the number of its elements; 

- Vz G Act(9, (f>), Vg*S>i(9, 0) denotes the d- vector of partial derivatives of \J/j with respect to 9. 

Lemma G.l. Under assumptions \5.1\ and \5.3\ (in) with 5 = r n if (9, (ft) G x £?(0o,r n ) are such 
that \I / (0,0) < 0, then there exists a N such that whenever n > N we have that Vl/(0, 0) < for 
every G B((f) ,r n ). 

Proof. Under assumption 15, 31 (iii) with 8 = r n for every G B((f>o,r n ) there exists a 9 G such that 
*(0,0) < 0. Denote by (0,0) this value (i.e. *(0,0) < 0). By assumption [5J] the function ^(0,0) 
is continuous in (9, (ft), then there is a N such that whenever n > N we have that ^(0,0) < for 
every G £(0 o ,r n ). 
Q.E.D. 

Lemma G.2. Zei assumptions \5.1\ and \5.2\ be satisfied with 5 = r n . For every c n > there exists 
a N such that for every n > N and <f> G B(<pQ, r n ) 

sup||tf(0,0)-tf(0,0 o )|| <£n- (G.l) 

Proof. Under assumption 15.14 the function — > "9/(0, 0) is continuous on <I>, for every 9 € Q, and 
uniformly continuous on B(<po,r n ), due to the compactness of i?(0o,r n ). Therefore, Ve n > there 
exists a 5g > such that V0 G _B(0o, r n ): \\^(6, 0) — ^(0, 0o)|| < e n for every 9 G ©. Now, for every 
0GB(0 o ,r n ) denote /^(0) = (0,0) - *(0,0o) II and 

A 5e := {0 G 0; /^(0) < e„, V0 G B{cj) ,r n ); r n < 5 e } 

for every G 0, e n > and <5g > 0. This means that V0 G there is a 5g such that G -A,$ fl . 
Under assumption 5.1 $${9) is continuous in 0, hence Ajg is an open set and \Jq g q As g is an open 
cover of 0: C Uee0 ^<V ^ ue *° compactness of (see assumption 15. 2 p there exists a finite set 
{5i, . . . , 5k}, K < 00 such that is a subcover of 0, that is, C Ui=i 
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Let 5* = min{<5i, . . . , 6k} so that A$. C As* for every i = 1, . . . , K and for any G we have 
G A$*. Remark that this 5* does not depend on 0. This then implies that for any G B((fto,r n ) 
and r n < (5* 

sup ||¥(0, 0) -*(0,0o)|| < e„. 

Q.E.D. 

Lemma G.3. Under assumptions \ 5.ll 15,^1 anrf 15,31 (mj urai/i 5 = r n; i/iere exists a N such that 
for every n> N the correspondence i— )■ 6(0) is roell defined and continuous at all G B((fto,r n ), 
that is, it is upper and lower hemicontinuous. 



Proof. This proof follows the lines of the proof of Lemma B.3 in iKaido and Santos I (|201ll ) with 
minor modifications. First, under assumptions I5.ll and 15.31 (hi) the set 6(0) is a convex set with 
nonempty interior for every (ft G B((fto,r n ). 

Next, we have to show that the correspondence (ft t— >■ 6(0) i s continuous (for a defi n ition of 
continuity of a correspondence see for instance Definition 17.2 in Aliprantis and Border ( 20061 )). 



First, we show that (ft \— > 6(0) i s lower hemicontinuous at an y (ft G B((fto,r n ). We show this 
by showing Theorem 17.19 (ii) in Aliprantis and Border ( 20061 ). that is, for any 0* G 6(0) (i.e. 



^{9*, (ft) < 0) and net {(ftj} with (ftj — > (ft, there exists a subnet {^j^gx and a net {^j^gx such 
that 6/3 G Q{(ftj p ), for every /3 G T, and 9p — )• 9* . In order to show this, consider a net (ftj — > (ft. Then 
we distinguish between two cases. Case I: 9* G mi(6)(0), i.e. *(0*, (ft) < 0. By Lemma [Gj] there 
exists a N such that for every n > N there exists a j n such that Jn G B((fto, r n ) and \I / (0*, (ftj) < 
for every j > j n . Define T = {j > j ra } and (ftj = (ftp with /3 G T. Fix 9p = 9* {i.e. 9p is a constant 
net equal to 9*) so that Op G 0(0^) for every /3 G T and 0/3 — > 0*. Case IT: 0* G 96(0). Since 6(0) 
is convex with non-empty interior then there exists {0a} that belongs to mi(6)(0), for every A, 
with 9\ — > 9*. By Lemma IG. II there exists a N such that for every n> N there exists a j n) \ such 
that (ftj G S(0o, r n ) and ^(9 x ,(ftj) < for every j > j n _ A (i.e. A G Q((ftj) for every j > j n ^). Since 
B((fto, r n) is compact then every convergent net admits a convergent subnet, that is, there exists 
l n such that {(ftjp}p£T converges to (ft where T = {j > max{/o,io,A}}- The corresponding 0^ = A 
satisfies 9p — > 9* by construction, 9p G 0(0 J(3 ) and 0* G <96(0). 

Now, let us show that the correspondence (ft \-± 6(0) i s upper hemicontinuous at any G 
B((fto,r n )- By theorem 17.16 in lAliprantis and Border I (|2006h it is sufficient to show that for every 



net {(ft>j,9j} such that 9j G 6(0j) for each j (i.e. ((ftj,9j) is in the graph of 6(-)), if <ftj 4> then 
9j > 0' < 6(0). 

To show this, first observe that, since 6 is compact then a convergent net 9j G 6(0.,) has a 
subnet J/3 G 6(0^) which is convergent, that is, J/3 — > 9* for some 9* G 6. Therefore, we have to 
show that 9* G 6(0). To show this first remark that there exists j n > such that for every j > j n , 
(ftj G B((fto,r n ). Since B((fto,r n ) is compact then every convergent net (ftj — > (ft admits a convergent 
subnet {0j /3 } ( g g x such that 0^ — > (ft. By the result of Lemma fG. 21 we have that 

*(0 ifl ,0^)-*(0^ 5 0) 

converges to uniformly in 0. Moreover, under assumption 15.11 the function 0) is continuous 
in 0. This allows to conclude that 

*(0 ifl ,0 ifl )^*(0*,0) (G.2) 

(because *(0, > ,0, > )-*(0 i/3 ,0) = (*(^,0 ij9 ) - * (0 i/3 , 0)) + (*(0 ifl , 0) - *(0*,0))). Since *(0 jp , <f> jp ) < 
because 9j g G 6(0 Ja ) then ^(0*,0) < 0. We conclude that 0* G 6(0) and upper hemicontinuity 



55 



is established. Q.E.D. 

Lemma G.4. Let Assumptions \5.I\ \5."A and \5.'A (ii)-(iii) hold with 5 = r n . Then, there exists a 
N such that for every n> N the correspondence 

(p, 0) i-> S(p, 0) = argmax{p T #; ^(9, 0) < 0} 

has non-empty compact values and it is upper hemicontinuous on S d x B(4>o,r n ). 

Proof. Let tq : — > (0,1) be a measurable and differentiable function of p. For every 0i,02 £ 
B(cpQ, r n ) define To (p) = r o(p)0i + (1 — r o(p))02- Lemma [G.3I implies that there exists a iV such that 
for every n > N the correspondence <f> \-¥ 0(0) is well defined and continuous at all £ B(<pQ,r n ). 
Therefore, for any 0i,02 G B((f>Q,r n ), the correspondence p h-> 6(0 To (.)) : S d — )• M d is continuous 
because it is the composition of continuous functions. 

Under assumption 15.11 the function 8 \— > ^>(9,(p) is continuous in 9 for every </) 6 $ then it 
is also lower semi-continuous. Therefore, for every i = l,...,k and £ $, the lower level sets 
{8 £ 0; ^(#,0) < 0} are closed and the set 0(0) is closed because it is a finite intersection of 
closed sets. Because V0 G <£, 0(0) C and is compact (under assumption I5.2j) then the set 
0(0) is also compact. 

Now , under assumptio n 15.31 (i i) we can apply the "Berge Maximum Theorem", see e.g. theorem 



17.31 in Aliprantis and Border I ( 20061 ). which guarantees that the correspondence 



P >->• 3(P, 0roO)) = ar g max P T # 
e ee(0 To(p) ) 

has nonempty compact values and it is upper hemicontinuous for every 0i , 0? G -B(0n, r„) . 



By the definition of upper hemicontinuity (see e.g. theorem 17.16 in Aliprantis and Border ( 20061 )) 



for every net {pj,6j} such that 9j G H(p 3 -, TO (p.)), if Pj — > p then 9j ^ 9 £ S(p, TO (p))- The corre- 
spondence p i—)- S(p, (f> To (p)) may be rewritten as a correspondence of two arguments: (p, 0t o ( p )) ^ 
3(p, 4>t (p)) ~ where (p, <t> T0 (p)) m turn is the value taken by the map (p, 0i, 02 ) •->■ (p, 0t- o (p)) ~ anci , 
since for every net (pj,(j)r ( Pj )^j) such that % G S(p,-, 0r o ( Pj ))> if (Pj> 0t o ( Pj )) -> (p>0t(p)) then 
9j ^ 9 £ E(p, TO ( p )), it follows that the correspondence (p, TO ( P )) E(p, TO (p)) * s a l so upper 
hemicontinuous. Since every G £>(0o,r„) can be represented as TO ( P ) (if we choose 0i and 02 
on the boundary of B((j)Q,r n )) then we can rewrite the correspondence as (p, 0) i— > 3(p, 0) which is 
upper hemicontinuous. Q.E.D. 

Lemma G.5. Let assumptions I5.il 15.^1 anrf l5.3l (ii)-(iii) be satisfied with 5 = r n . Let W C S d &e 
compact and S(p, 0o) 6e a singleton \/p £W . Then there exists a N such that for every n> N and 
G B((f>Q,r n ) there exists a e n > which goes to as r n — >• suc/i i/iat /or 0*(p) = H(p, 0o), 



sup sup \\t> — U*(p)\\ < £ n 
pew 0E»(p,(t>) 

Proof. For every (p, 0) G § d x B(0 o ,r n ) define H 5 (p,0) := {0 G M d ; mf^g^ ^s ||0 - 9\\ < 5}. Since 

110 — 0o|| < r n an d S(p, 0) : S d x B(4>o,r n ) — > M. d is upper hemicontinuous by lemma lG~4l (for n 
sufficiently large), then whenever G B(4>o,r n ) for any p £ S d there exists a e n > such that 
2(p 5 0) C H £n (p, 0o ) where e n — >• as r n — > 0. This implies that 



sup \\9-8*(p)\\< sup || 9 - 0*(p) || < e r , 

0£H(p,</>) 06H £ n(p,<£o) 
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Now, fix the sequence e n , denote f<j>{p) ■= sup g£ ^ p ^ \\9 — 0*(p)\\ for <f> G B(cj) ,r n ) and 

A Sp := {p G W; f<f>(p) < e n , \/<p G B((/> ,r n ) with r n < 5 P } 

for every p G W. This means that for any p G W there is a <5 P so that p G -Ajv p - Since /^(p) is 
continuous in p (by the result of lemma |G~4|) . hence is an open set and UpeVK As p is an open 
cover of W: W C Upew^V 

Due to the compactness of W there exists a finite set {5\, . . . , o"^}, if < oo such that {A$ i }?L l is a 
subcover of VK: W C Uj=i ^<?i • Let <5* = min{o~i , . . . , <5^} so that C As* for every i = 1, . . . , K 
and for any p G W we have p G j4<5* . This then implies that for any r n < 5* 

sup sup ||0 — 0*(p)\\ < e n . 
pew 6es(p,<t>) 

□ 

Lemma G.6. Let assumptions 15, il 15,^1 anc? l5,51 (ii)-(iii) be satisfied with 5 = r n . Let W C § d 6e 
compact and H(p, <^>o) &e a singleton \/p G W. Moreover, let e n > satisfy assumption 15.61 (wj. 
T/ten /or 0*(p) = S(p, O ), 

sup sup sup ||0 — 0*(p)|| = 0(r n ). 
0eB(<^o,r'„)peweeH(p,</») 

Proof. By lemma lG~5l we know that sup pe ^ supg e g( p ^ ||6* — < e n where now e n is chosen 

such that e n = 0(r n ). Such an e n exists by assumption 15.61 (iv). Next, for <fi G B(<fio,r n ), denote 
f(4>) := sup peW sup 9eH(pj9i) \\9 - 0*(p)|| and 

^ := G B((j)o,r n ); f(4>) < e nt Vr„ < fy} . 

Remark that 5$ must be such that cp G Aj, for every eft G B((j)Q,r n ). Since f(</>) is continuous in 
0, hence A§^ is an open set and U</>e-B(0 o r n ) * s an °P en cover of B((j)o,r n ), that is, B(4>o,r n ) C 
U<£eBU r n ) ^<5<£ ■ Due *° compactness of B((f>o,r n ) there exists a finite set {Si, ... ,6k}, if < oo 
such that {A^llLi is a subcover of B(4>o,r n ), that is, 

if 
1=1 

Let 5* = min{o~i, . . . , 5k} so that Ag f C A,5* for every i = 1, . . . , if and for every G B((fto,r n ) we 
have G . This then implies that for any r n < 5* 

sup sup sup ||0 — 9*(p)\\ = 0(r n ). 
<t>£B(<t> ,r n ) pew 0es(p,<p) 

Q.E.D. 

Lemma G.7. Let assumptions \ 5.l\ \5.S\ 15.31 (Hi) and \5.4\ (%) be satisfied with 5 = r n . For every 
G @((f>o) denote by Act(9,cf>o) := {i; ^i(9,(j)o) = 0} the set of constraints in ^(9, </>o) < which 
are active and by dA{9,4>) the number of elements in Act(9,cp). Then, there exists an N such that 
dj±(9, 4>) < d for every 9 G Q{4>)> 4> G B((j)Q,r n ) and n > N. 
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Proof. Consider the correspondence ©(•) : B((p ,r n ) — > and a net (4> a ,9 a ) in the graph of ©(</>) 
that is, (j) a G B((f>Q,r n ) and 9 a G Q((j) a ). Under Assumption 15.21 Q(4>) is compact V</> G B(cpQ,r n 
since it is a closed subset of a compact space. Because, by lemma \G.3\ th e correspondence 8(-) 



B(<j)v,y r n ) — > Q is upper-hemicontinuous it follows by theorem 17.16 in lAliprantis and Border i 
(|2006l ) and equation (|G.2|) that for every i G Act c (9, 4>o), for any e > 0, there exists an N such that 



for every <j) a G B(4>q, r n ) with n > N we have 

|tti(0 a ,0 a ) - < C, *i(0a,<£a) < 

where 0* G 0(</>o)- This is because, since ^i(9*, 4>q) < for every i G Act c (9, <fto) then there 
exists a e > such that ^ , </>o) < — £i and, for any e < ?j, we can always find a A r such that 
^i(6 a ,4>a) < for every <j) a G B((f>o,r n ) and n> N. 

This means that for every 6* G 0(</>o)> Act c (8, </>o) C Act c (9 a ,(j) a ) for every a G B(cfio,r n ) and n 
sufficiently large. Therefore, the reverse inclusion holds for the complements of these sets: 



Act(9 a ,<p a ) C Act{6,<h), 9 G 9(0o), e Q G e(0 a ) (G.3) 

for every </> Q G B(<po,r n ) and n sufficiently large. By assumption 15.41 (i) we have cIa(9,4>o) < d 
which, together with (1G.3|) . implies that for any 9 G ©(</>), d,A(8,(f>) < d for every G B(cj)Q,r n ) and 
n sufficiently large. Q.E.D. 

Lemma G.8. Let Assumptions 15.11 15.^ 15.31 (mj, 15.31 (v) and \5.4\ (ii) be satisfied with 5 = r n . 
Then, there exists an N such that for every n > N , <f> G B(<fio, r n ) and 9 G ©(</>) the vectors 



{Ve^i{9,4>)}ieAct{B,<j>) 

are linearly independent. 

Proof. By (|G.3p we have the inclusion Act{9,4>) C Act(9, 4>o), for every 9 G ©(</>o)) G B((fto,r n ) 
and n sufficiently large. Therefore, we can prove the results by considering the indices in the biggest 
set Act(9,4> ). 

Since, by lemma IG.31 the correspondence <j) \— > ©(</>) is upper hemicontinuous then for every net 
{4> a ,9 a } in the graph of 0(-) (i.e. such that 9 a G Q((p a ), Va such that 4> a G B((j)Q,r n )) we 
ha ye that if 4> n — > 4>n the n 9 n — > 9* where 9* is some element of ©(</>o) (see e.g. theorem 17.16 
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Aliprantis and Border I ( 20061 )). Because by assumption 15.31 (v) the vectors Vg^i(9, 4>), i G 



Act(9,4>o) with 9 G 6(0o), are continuous in (9,4>) it follows that 

V e %(9 a ,(f> a ) -> V e tt<(0*,fo), Vi G Act(0,0o), G G(0 O ). (G.4) 

Now, denote by Vg^" 4 ^, </>) the (cZ x d A ) matrix obtained by stacking columnwise the vectors 
{V e ^i{9, 4>)}ieAct(e,<fo) and W {Pi{0, 4>)}ieAct{e,4> ) its singular values. By assumption[5l](ii) the ma- 
trix Vg^ A (9, (f>o) is full-column rank and then there exists a e > such that inf i^ A ct{e ,<f> ) Pi(9, 4>o) > 
e. Continuity of the s ingula r values (which follows from the continuity of Ve^/ A (9, (f>), see e.g. the- 



orem II. 5.1 in iKato I (| 19951 )) and (|G.4|) imply 



Pi(9 a ,<p a ) pi{9\<p ) Vi G Act(9,<p ), 9 G 0(</» o ). 

We conclude that there exists a N such that for every n > N, <j) G B(<pQ,r n ) and 9 G ©(</>) the 
eigenvalues {/>»(#, 4>)}i&Act{d,<j> ) are strictly positive which implies that Ve^ A (9, cf>) is non-singular. 
Henceforth, {Ve^i(9,<P)}i^Act(e,<j>) are linearly independent. Q.E.D. 
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Lemma G.9. Let assumptions ^. 11 \5.2l\5.3\ (Hi). 15,31 (v) and \5.4\ (ii) hold with 5 = r n . Then, there 
exists a N such that for every n> N , G -B(0o, r n ) and p G § d there exists a unique X(p, 0) G 
satisfying 

sup p T 9 = sup {p T 9 - X(p,(p) T ^(9, 0)}. (G.5) 
6>eew>) 6»ee 

Proof. For every G <!>, 0(0) is a compact set since it is a closed subset of 0, which is compact 
under assumption 15.21 This implies that sup flC ft( ^ < p, 9 > is finite . Hence, if assumption 15.31 (hi) 
holds then, the conditions of Corollary 28.2.1 in I Roeka foliar I Jl97f]) are satisfied and by applying 
this corollary we obtain that, for every 6 B(4>o,r n ) and p G §> d , there exists A(p, 0) satisfying 
equation (jG.5j) . 

In order to show uniqueness of X(p,4>), suppose that for ((f), p) G B((f>Q,r n ) x §> d there exist two 
different vectors X\(p, 0) and X2(p, 0) that satisfy equation ()G.5 j) . Since assumption ^. 31 (iv) implies 
H(p, 0) C intO for all (p, 0) G S d x £?(0o,r n ), where int@ denotes the interior of 0, it follows that 
for any 9 G E(p, 0), Xi(p,(j)) and X2(p,4>) satisfy the first order condition: 

p-V *(0,0)Ai(p,0)=p-V e *(6>,0)A 2 (p,0) = O. (G.6) 

By the complementary slackness condition the Lagrange multipliers of the non-binding constraints 
are equal to 0. Therefore, equation (jG.6j) simplifies to 

dA d^ 

p-^>l(p, 0^^(0,0) = p-^A|(p, 0^^(0,0) = (G.7) 

i=l i=l 

which, after simplifications, gives 

dA 

^ (Ai(p, 0) - A 2 (p, 0)) V^fl, 0) = 0. (G.8) 
i=i 

By lemma [G~8l the vectors {Vg^/i(9, 4>)}i£Act{6 <j>) are nnear ly independent for G B(<po,r n ), 9 G 
0(0) and n sufficiently large. Therefore, the same holds for 9 G 5(p, 0) with p G S rf since H(p, 0) C 
0(0). This and (lG~8|) contradict Ai(p,0) / A 2 (p,0). Q.E.D. 

Lemma G.10. Let Assumptions I5.il 15.^1 15.31 (ii)-(v) and \5.4\ (ii) hold with 5 = r n . Then, there 
exists an N such that for every n> N the vector X(p,cf>) is continuous in (p,<p) G S rf x £?(0o,r n ). 

Proof. For every G B(<f)Q,r n ) and 9 G 0(0), denote by X A (p, 0) the d^-vector with components 
{X l (p, 0)}ieAct(e,0) an d by \7e^ A (9,(f)) the (d x d^) matrix obtained by stacking columnwise the 
vectors {Vg^>i(9, <f)}i^Ac±{e,4>)- By lemma [G~8| there exists a iV such that Vn > N, G £?(0o,r n ) 
and G 0(0), the matrix [Vg^'" 4 (0, 0)] T Vg^/^(0, 0) is invertible. It follows from the first order 
condition in (1G.7|) - which is valid under assumption 15.31 (iv) - that for n sufficiently large we can 
write 

A A ( P ,0) = ([V^ A (0,0)] T V^ A (^0)) _1 [V ^ A (9,4>)] T p, 
V(p,0) G S d x B(4> ,r n ) and G S(p,0). Since A*(p,0) = for every i Act(6,4>) then ||A(p,0)|| = 
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\\X A (pA)\\ and 



sup sup \\\(p,(f))\\ = sup sup \\X A (p,(t>)\\ 

pSS d <j>eB(tf> ,r n ) p& d <f>eB((j> ,r n ) 

-1 



< sup sup 

p&S d <t>&B{(t>Q,r n ) 



[V e * A (e(p, 0), <p)] r V e * A (6(p, 0), 4>) || V e y A (6(p, <f>) 



< sup sup (p(e(p, (/)),</))) sup sup \\V e ^ A (9(p,(f) 

pes d 0es(</. o ,r„) v 7 P es d 4>&B(4> ,r n ) 

where 9(p,4>) G E(p,(f>) and p(9,4>) denotes the smallest singular values of Vg^ A (9,(f>). Under 
assumption 15 . 31 (v) . and because Act(9,(p) C Act(9, (fro), V# € ©(</>), £ B(ipo,r n ) and n sufficiently 
large under assumption 15.41 (ii). the matrix V 'g^ A {9 ,(f>) is continuous in (9,<f>). Further, because 
H(p, (j)) C ©(</>) and S d x B((f)Q,r n ) x <^>) is compact (compactness of E(p,<p) follows from 
lemma |G~4|) . it follows from the extreme value theorem and lemma lG~4l that Vg^ A (9, 4>) attains its 
maximum value on S rf x B((j)Q,r n ) x H(p, (f>) so that there exists a constant < C\ < oo such that 

sup sup \\Ve^ A {0~(p,4>),4>)\\ < SU P SU P SU P l|V e ^ A (6l,0)|| < Ci. 

pgS d cp£B(4> ,r n ) p£S d <t>€B(<t> ,r n ) 6>e~(p,0) 

Continu ity of the sin gular values (which follows from the continuity of Vg^ A (9, <p), see e.g. theorem 
II. 5.1 in Kato ( 19951 )) and compactness of E> d xB(4>Q, r n )xE(p, (f>) implies that there exists a constant 
< C2 < 00 such that 

sup sup (p{9(p, (f>), 0)) < sup sup sup (p(9,(f))) 2 < C 2 . 

This shows that X(p,(p) is uniformly bounded in (p,4>) £ S d x B(4>o,r n ) which implies that it is 
continuous. Q.E.D. 

Lemma G.ll. Let assumptions \ 5.lj \5.Sh 15.31 (ii)-(v) and \5.4\ (ii) hold with 5 = r n . Then, there 
exists a N such that for every n > N and for any e n > which converges to as r n — > we have 

sup sup \\X(p,(j)) - X(p,4>o)\\ = 0(e n ). 

peS d 4>eB((t>o,r„) 

Proof. By lemmas IG.9I and 10.101 there exists a N such that for every n > N the function A : 
S d x B((j)Q,r n ) —7- ]R+ is singleton valued and continuous in (p, <fi) gS^x B((j)Q,r n ). Therefore, by 
compactness of B((f)Q,r n ) the function eft 1— > X(p,4>) is uniformly continuous on B((f>Q,r n ) for every 
p 6 This means that for every p and any e n > which converges to as r n — > there exists a 
natural number N p that depends on p such that for all n > N p , 

sup ||A(p,0)-A(p,fo)|| <e n . (G.9) 

4>eB(4> ,r n ) 

For a fixed e n define / n (p) := sup 0eB((/ , o >rn) ||A(p,0) - A(p,<fo)|| and 

Ajv p := {p G S d ; f n (p) < e n , Vn > iV p j 

for every p 6 This means that for any p 6 S rf there is a iVp so that p 6 Since is 

continuous in hence An p is an open set and Upes^ ^N p is an open cover of S d : S d C Upes^ A/v p - 
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Due to compactness of §> d there exists a finite set {Nx, . . . , Nr}, K < oo such that {A^.}f =l is a 
subcover of S d : S d C \jf =1 A Ni . 

Let A^* = max{iVi, . . . , Nk} so that Ajf. C ^4^* for every i = 1, . . . , if and for any p G S d we have 
p € ^4at*. Remark that this N* does not depend on p. This then implies that for any n > N* 

sup sup \\X(p, 4>) - X(p, cj) )\\ < e n . 

p€§ d 4>GB(<j> ,r n ) 

□ 

Lemma G.12. Let Assumptions HOI \5.2l 15.31 (ii)-(iii) hold with 5 = r n . If (9 , 4>o) contains a 
subvector of strictly convex functions ^si'^o) of 9, then there exists an N such that for every 
n> N and any 4> G B(4>o,r n ), p G S d for which S(p, <f>) is not a singleton we have 

*s(0, 4>) < for some 9 G H(p, 0). 

Proof. By lemma lE~T1 there exists an such that for every n> N and G B((j)o,r n ) the function 
— )• ^5(0,0) is strictly convex. Moreover, since by lemma fG.41 5(p. 0) is compact for every 
(p, <j>) G S d x B((j)Q, r n ) for n sufficiently large, then 3(p, 0) is closed and bounded (by the Heine-Borel 
theorem) and convex (since it is a closed subset of the convex sets 0(<^>)). For some 9x,02 G S(p, (f>), 
9x 7^ 02, and z/ G [0,1] define 9 := z/#i + (1 — v)02- It follows that 9 belongs to S(p, <^>). Since 
^fg(-,(f)) is strictly convex in for every G B(<pQ,r n ) we conclude that 

*s(0, <t>) < v*s@\,4>) + (i " ^(#2, 0) < 

where the last inequality is due to the fact that #i,#2 G E(p, 0) C ©(</>). Q.E.D. 

Lemma G.13. Lei To : § d — )• (0,1) 6e a measurable and differentiate function of p. For every 
<Pi,<t>2 G B(<fio,r n ) define <Ar (p) = to(p)0i + (1 — To(p))^2. Lei assumptions \5.1l EH EJ (i)-(v), \5-4\ 
(ii) and \5.5\ hold. Then, there exists a N such that Vn > iV i/iere exists a constant A(0o) > and 
an e„ > suc/i iTiai e n — >■ as r n — > and : 

sup sup sup 11^^(9,(1)^) -V^(9^(j)), (p )] T X(p,(p )\\ <Xcp )e n . 

4>i,4>2&B(4> ,r n ) P es d e&E(p,<t> To(p) ) 

Proof. First, remark that 

|| [V^(9, ( f )Toip) )-V^(94p),4> )] T X(p,4> )\\ < II [V^(0,0 To(p) )-V (/) vl/(0, ( / )o )] T A(p, < /» o )|| 
+ || [V^(9, 0o) - V^(0,(p), O )] T A(p, <£ )|| = ^1 + ^2 (G.10) 

and the function p 1— > X(p, <fio) is continuous in p G S rf by lemmas IG.9I and IG.10I Therefore, it 
attains its supremum. We start by analyzing the first term Ax' 

sup sup sup II [V^(9,(p Toip) ) - V^(9,(p Q )} T X(p,<po)\\ 

<pi,<j>2&B(<t>o,r n ) pes d e^s.{ P ,<f> TQ(p) ) 

< sup sup||V^(0,</>)- V^(0,0 O )|| sup ||A0, O )|| 

since by convexity of B(<po,r n ) we have 4>\i4>2 £ B((f)Q,r n ) implies </> T0 ( p ) G B((pQ,r n ), \/p G S d . 
In order to show convergence to zero of this term we follow the proof of lemma IG.1H therefore 
we shorten explanations. By compactness of B(<po,r n ) and under assumption 15.31 (i). the function 
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4> > V (6 , (j>) is uniformly continuous on B((j)Q,r n ) for every 9 G Q. Hence, for every 9 and any 
e n > which converges to as v n — y there exists a natural number Nq that depends on such 
that for all n > Nq we have 

f n (9):= sup ||V^*(M)-V^(Mq)II <£n- (G.ll) 

Define A^ g := |(9 G 0; f n (0) < e n , Vn > iVg j for every 6> G 0. Since f n {0) is continuous in 9, 
hence Ajsr 9 is an open set and Use© ^ N e 1S an °P en cover of 0. Due to compactness of there 
exists a finite set {iVi, . . . , Nk}, K < oo such that {^a^}^ is a subcover of 0: C An*- 

Let N* = max{iV"i, . . . , Nk} so that C A^r* for every i = 1, . . . , K and for any # 6 we 
have 9 £ An*. Remark that this N* does not depend on 9 so for any n > N*, 

sup sup \\V^(9,(p) -V^(9,(po)\\ < e n . 

eee<f>eB{<i>o,rn) 

We conclude that the first term is upper bounded by e n . 

Now we analyze the second term A2 of (|G.10p . We have to consider two cases: /. for every 
(p,4>) € S d x B((f>Q,r n ) the set E(p,4>) is a singleton; II. for some (p,(j)) € S d x B{cf)Q,r n ) the set 
(j)) is not a singleton. 

Case I. This case corresponds to the situation where assumption 15.51 (i) holds. Hence, the corre- 
spondences 9*{p) and 4> T0 [ P )) are single- valued and 

sup sup sup || [V^(0,0o) - V^(9*(p),(j)o)] T \(p,(p )\\ 
4>± ,<t>2 es(0 o ,r n ) peS d 0es( p ,</v o (p) ) 

= sup sup || [V0*(£(p,0 ro(p) ),0 o ) - V ^(H(p,0o),<Ao)] T A(p,0 o )||- 

<t>i,<t>2£B(4> ,r n ) p& d 

By Lemma lG.41 = <^ T0 ( P )) is a continuous function of p and by assumption l5.3l (i) the matrix 
V^(9,4>) is continuous in 9. Because the composition of two continuous functions is continuous, 
this implies that V,p^(E(-, ^ T0 (.)), 4>o) is a continuous function on § rf x B((f>Q,r n ). Moreover, the 
compactness of S d xi?(0 o , r n ) implies that V0^(H(-, </> T0 (.)), <fto) is uniformly continuous in (p, <pi, fa). 
From this and that sup B ^ ^ n ^ sup pg §d ((^(p) — fa\\ < r„, it follows that there exists a e n > such 
that 

SUp SUp ||V^(H(j),c/v o ( p) ),(/>o) - V^(S(p,0o) ) 0o)|| < 

0i,</>2e-B(</>o,r„)pe§ d 

Remark that this e n converge to as r n — > 0. Finally, because X(p, fa) is uniformly bounded in p 
by some constant, say Xi{fa) > 0, (by Lemma lG.101 and by compactness of S d ) we conclude that 

sup sup sup || [V^(9,fa) - V^(9*(p),fa)] T \(p,fa)\\ 

<t>i,^B{<j>o,r n ) peS d 6>gS(p,0 To(p) ) 

< sup sup ||V0^(H(p,0 ro(p) ),0 o ) -V (j) ^(E(p,fa),fa)\\ sup ||A(p,0 Q )|| = AiO )£ n - 

</>i,<feeB(0o,r-n)pes d peS d 

Cose 77. This case corresponds to the situation where assumption 15.51 (ii) holds. For some 
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5 n := S(r n ) > that converges to with r n define: 

G S d ; H(p, 0o) is not a singleton | 



p G § , inf \\p — p\\ < 5 r 



and S„, C C § d . Therefore, 



sup sup || [V ^(^,^ o ) - V ^(^(p),^ o )] T A(p> )|| = 

pGS d 6»eH(p,0 ro(p) ) 

sup sup || [V^(0,<fo) - V^(0,(p),0o)] T A(p,<fo)|| 

pes^ 0eH( P ^ ro(p) ) 

+ sup sup ||[V^(0,^o)-V^(^(p),</»o)] T A(p,0o)|| =:Si+^ 2 (G.12) 

pe(S* s ) c ^63(p^r ( P )) 

where (§^ s ) c denotes the complement of and is a closed and compact set. 

We start by analyzing term B\. By lemma 5.1 if there exists a subvector of strictly convex 
constraints ^si'ifio) of then the function — > \Pg (•,<£) is strictly convex for every <\> G B(<po,r n ). 
Then, by lemma lG.121 for all p G §> d for which H(p, <^o) is not a singleton we have ^s(0,0o) < 
(where the inequality holds componentwise) for some G S(p, 0o)- This means that these 
constraints are not binding and the corresponding Lagrange multipliers, say Xs(p,(f>o), are equal 
to (by the complementary slackness condition). Therefore, by the uniqueness of the Lagrange 
multiplier (see lemma lG~9l) . Xs(p, <f>o) = is the optimum value of the Lagrange multipliers and B\ 
simplifies as 

sup sup || [V^(Mo) - V^(0,(p),<fo)] T A(p,0o)|| 

= sup sup \\[V^l{0Ao)-^^l{0*{p)Ao)] T ^l{pAo)\\ 

P &i a 0e=.(p,</> To(p) ) 

= sup sup || [V^ 2 (0 O ) - V^ 2 (^o)] T A L (p,</. )|| = 0. 

p& s na 0eH( P ,</v o(p) ) 

Let us consider term £? 2 . By the result in lemma l£T5l with W = (§5L) C , there exists an e n > 
such that sup^g^^ ^) sup pe(S 5 s)c sup egH{pj0) ||0 - 0*(p)|| < e n . Moreover, since 6 H> V<^(0, <j) Q ) is 
uniformly continuous on O (under assumptions I5.ll and 15. 2| ) it follows that for every <j) G B((fio,r n ) 
there exists an e n > such that 

sup sup ||V0#(0,0o) - V0*(0*(p),0o)|| <e n . 

P e(SL) c ees(p,0) 

Since B(4>q, r n ) is compact we can easily show, by using a proof similar to that one used in lemma 
EZEH that sup^s^^) sup pe(s «j c sup e6H(Pj0) || V^,*(0 5 O ) - V^(0*(p), <fo)|| < e_n- 
Therefore, by Lemma |Q. 101 and compactness of § rf there exists a constant, say A 2 (</>o) > 0, such 
that sup pe(s «j c \\\{p,4>o)\\ < A 2 (^o) and sup^ 1 ^ 26B( ^ 0)7 . n) B 2 is upper bounded by 

sup sup sup ||V^(0,</) O ) - V^(0*(p),</> o )|| sup ||A(p, O )|| < A 2 (0o)e n - 
0eB(0 o ,r„) pe(S* s ) c ees(p,<^) P6(S^) C 
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We conclude that 



sup sup sup ||[V^(0,0o) - V^(0*(p),(fo)] X(p,<M\\ 
= sup (Bi + B 2 ) < A 2 0o)£n- 

Q.E.D. 

Lemma G.14. For any 4>i,4>2 G B(cj)Q,r n ) and r G (0,1) define T = t4>\ + (1 — t)02 with 
02 = 0t|t=o a-nd 4>i = 0t|t=i- as sumptions \ 5. 11 15.^1 15.31 (i)-(iii), 15.31 |5.^| and\5Jj\hold with 
5 = r n . Then, there exists a N such that Vn > N and for all p € S d we have 



T=TQ (p) 



x(p, 4>t o (p))'^<i>^(0(j>), 4>t (p))[4>i - fa 



(G.13) 



where tq : S d — > (0,1) is a measurable and differentiable function of p, 0(p) £ H(p, To (p)) ar1 ^ 
V^^flfp), <f>Tx>(v)) denotes the (k x d^) -matrix ^(6,4>) evaluated at (0,0) = (0(p), TO ( p )). 

Proof. Define £(fl A,p,mfa)) =< p, > -A(p, 0^ r) VW To( p)) for X(p,(f>) : S d x -> R*. 
By Corollary 5 in iMilgrom and Segal I ( 20021 ) (which can be applied under assumptions 15. 1\ 15.21 
and 15.31 (i)) the function S$ ^(j>) is directionally differentiable in p and its directional derivatives 
(in direction p) are given by -g^S^^p) = max eeS(p>9iTo) ^L(6, \,p,T (p)) and ^r% o(p) (p) = 
mm 6>eH(p,4>T ) ^-^(^j A,p, To(p)). For simplicity we have shorten ro(p) with to- 
Now, by denoting Tq := ^ we have 



±L(e,\,P,T (p)) = 9- ^-XipAr) 

dp op 



r=T (p) 



VJ| T=T0 (p) 



-(0i - </>2)' ^— A(p,^ T ; 



*(0,0 T )| T=To (p) 4 ~ KP,K(p))' V^{9At)\ t -_ 



710 (p) 



"VT T=To(p) 

Therefore, the partial derivative of L(8, X,p,r) with respect to its fourth argument is: 



— L(0,A,p,r) 



and 



dS^ip) 



dr+ 



T=TQ 



dr- 



T=T 



d 



^' 7^T A ^' ^) - 0r(p))%*(^ 0r)(01 " 02) 



max 

e6S(p,</. T0 ) 



5 

^2) -w— x(p,4>t] 



-X{pAt )' V^(0,0 r )l (0 X -0 2 



mm 

6»eH(p,</. T0 ) 



5 

(0i - 02) ^-ao?,0t; 



7 > Wlr^fp) 



7 > ^r;| r=ro (p) 



-A(p,0 To y V^(0,0 r )| r=TO (01-02 



The first term on the right hand side of both these equations is equal to zero because ^(0, T ) | T=T0 = 
for 9 £ S(p, ro ) since this is the first order condition of the optimization problem in S(p, ro ) 
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evaluated at the optimum value 6. More precisely, it is the partial derivative of the Lagrangian 
function with respect to the Lagrange multiplier. Thus, 



dScf> T (p) 



dr+ 

dS<t, T (p) 



dr- 



T=T() 



max 

0eH(p,0 TO ) 

min 

0eS(pA o ) 



KpAtJ v^(Mr)| T=76 - 60 

KPiK)' V+VMr^ (fa-fa) 



(G.14) 
(G.15) 



is a singleton then dS ^^ 



dS^r 0) 

dr- 



If p E S d is such 



If p E S is such that E(p,(f> T0 ) 

that S(p, To ) is not a singleton then, by Lemma IG.12I there exists a N such that Vn > N and 
01,02 E B(<po,r n ) there is a E H(p, TO ) such that 'I's^, TO ) < 0, where denotes the vector 
of constraints that are strictly convex in 9. This means that these constraints are not binding 
and the corresponding Lagrange multipliers, say A,s(p, ro ), are equal to (by the complementary 
slackness condition). Therefore, by the uniqueness of the Lagrange multiplier (see Lemma lG,9j) . 



\s(p,(f>T ) = is the optimum value of the Lagrange multipliers and the term in 
simplifies as 



dr+ 



and 



dS±M 

dr- 



lT=T 



*l(pAt )' V^(Mt)L 



over S(p, (f> To ). By using the expression given in assumption [53] (ii) for the linear constraints we get 
V^(0,0r)L = =V A 2 ( 



| T=ro which does not depend on 6 so that dS ^^ 



even when S(p, ro ) is not a singleton. 
Q.E.D. 



T=T 



d S$ T <J>) 
dr- 



T=T 



Lemma G.15. For every 0i,02 G B((pQ,r n ) and r E [0, 1] define r := r0i + (1 — t)02 and: 



f(fa,fa) := sup (A(p,0 ro(p) ) J V^(^(p),0 To(p) ) - A(p, 0o) i V^(e*(p),0o)j [01-02J 



where r : S d — >• (0,1) is a measurable and differentiate function of p, 0(p) E H(p, ro ( p )) and 
0*(p) E S(p,0 o )- 

Lei Assumptions [h\l\ \5.2\, 15.31 (i)-(v), \5.4\ (ii) and 15.51 ao/d w't/i 5 = r n . Then, there exists a 
constant C > and an N ( independent of 0i and 02 J such that for every n> N 



/(01,0 2 ) 

sup 77- —-7 < 6e n 

h,<h£B{<j> ,r n ) 1101 - 02 || 



where e n — > as n-> oo. 



Proof. Remark that 0(p) depends also on 0i and 02- By the Cauchy-Schwartz inequality we can 
write: 

A(p,0 TO ( p) ) T V^(^(p),0 ro(p) ) - A(p,0 o ) T V^(^(p),0o))[0i - 2 ] 



< 



V^(6(p), <J) To(p) y A(p, To(p) ) - V^(0,(p), O ) J A(p, 0o ) 
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so that 

J{ " , - (> ' 2) < ||V^(0(p),0 To(p) ) T (A(p,0 To(p) )-A(p,0 o )) II + 



1 - P2\ 

V #(0(p), ro(p) ) - V^(0*(p), O )) T A(p, O )|| =: -4i + ^t 2 - 

We start by analyzing term Ai. Since B((fto,r n ) is convex then 0i,02 £ B((fto,r n ) implies 
^eB(fer n ),VrG [0,1]. Thus, 

sup sup^li= sup sup \\V (t) y(6(p),(j) To(p) ) T (A(p,0 ro{p) ) - A(p,0 o )) || 

<^i,02eB(0o^n)pGS d </>i,</>26B(c/>o,r n )peS <i 

< sup sup sup sup \\V ( p^(8 1 <j) T ) T (\(p, <ft T ) - A(p,0 o )) || 

<l>l,<t>2£B{<p ,r n ) p& d re[0,l] eeE(p,<t> T ) 

< sup sup sup ||V«^(0,0)|| sup sup ||A(p,0) - A(p,0 o )|| 
4>eB(<j> ,r n ) P £S d 6»eS(p,0 T ) 0e-B(</>o,r„) P es d 

< sup sup ||V^*(6>, 0)|| sup sup ||A(p,0) - A(p,0 o )||. 

<j>&B(<j> ,r n )0ee 4>&B(4> ,r n )peS d 

By assumptions 15.21 and 15.31 (i). V<^ exists and is continuous in (0, (ft) £ x B((fto, r n ). Since and 
B((ft ,r n ) are compact it follows that || V0\l/(0, 0)|| is uniformly bounded on x B((ft ,r n ), that is, 
there exists a constant V>(0o) > such that sup0 gB (0 o rn ) sup 9g Q ||V ( / ) \I , (0, T )|| < V>(0o)- By lemma 
IG. Ill there exists e n := e(r n ) > 0, e n — > as r n — > such that: 

sup sup ||A(p,0) - A(p,0 o )|| < e n 

p£§ d <t>£B{<t> ,r n ) 

for n sufficiently large and we conclude that sup^ ^g^^ ^) sup pgS d A\ < ^((fto)e n , where e n — > 
as n — > 0. Next, let us consider term A2 ■ 

sup sup ||(v0*(0(p),0 ro(p) ) - V^(^(p),0 o )) T A(p,0o)|| 

</>i,<feeB(</> ,»*„)pes <i v 7 

< sup sup sup || (V^(0,07o(p)) - V^(^(p),0 o )) T A(p,0o)||- 

<f>i,<t>2£B(<j>o,r n ) p€S d 0eS(p,</- To(p) ) 

By using the result of lemma IG . 1 3 1 there exists a constant A(0o) > and an e n := e(r n ) > 0, e n — > 
as r n — > such that 

SUP SUp SUP || (V0*(0,0 ro(p )) - V0*(0*(p),0 o )) T A(p,0 o )|| < A(0 O )£n- 

*i,</'2es((/-o,r„)peS d 0eB(p,</V o(p) ) 

By collecting the two upper bounds and by denoting e n = e n + e n we get the result. 
Q.E.D. 



Lemma G.16. Let r n = (log n)/n. Suppose that assumptions 15. ilJ5, 61 mi/i (5 = r n . Then, 
there exists a N such that for every n> N , 0i,02 G B((fto,r n ) we have: 

sup sup \y/n(S<p l (p) - Sfaip)) - v / raA(p,0o) T V ( / ) ^(0*(p),0o)[0i - 02] | = o(l) 
is a Borel measurable mapping satisfying 0*(p) € H(p, 0o) /or aZZ p G § . 
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Proof. On B((f>Q,r n ) we have that \\cpi — faW < r n for every <j>i,(f>2 £ B(<fio,r n ). Then, assumption 
15,61 (i) implies that ||A(p, </>i) — A(_p, ^2) || < ^]Tn and therefore 

sup ||A(p,0i) - \(p,<fa)\\ < Kir n . 
<f>&B(<p Q ,r n ) 

This implies that the rate e n in lemma IG, 111 is e n = 0{r n ). 

A similar argument may be applied to show that e n = 0(r n ) in lemma lG.131 To this aim, first 
consider term A\ in the proof of lemma IG.13I 

sup sup 11 [v^(M To(p) ) -v^(e,fa)] T \( P , </»o)|| 

P e§ d eeH(p,</. To(p) ) 

< sup sup sup ||V*¥(M) - V«^(0,0 O )|| sup ||A(p,0 o )|| 

p£S d <f>GB(<f> ,r n ) 9e3(p,ij>) p£S d 

< sup sup||V^(0,0)-V^(0,0o)||sup||A(p,0o)|| =O(||0-0 O ||) 

^e-B(<Ao,r„)6»ee pe s d 

since sup pg gd [|A(p, </>o)|| = 0(1) (by lemma IG.IOP and since assumption 15.61 (ii) implies 
su P<f>£B(<t> ,r n ) sup ee0 ||V,^(0,</>) - V0^(0,</>o)|| < K 2 \\(l> - (f>o\\. Now, consider term A 2 in the proof 
of lemma |G. 131 In Case I where E(p, </>) is a singleton for every (p, <p) £§ d x B(<fio,r n ) we have: 

sup sup ||[V tf(0,0o)- V ^(^(p),0 o )] T A(p,0o)|| 

pes d 6»eH(p,</. To(:p) ) 

< sup sup ||V^(H(p,0),^ o ) - V^(0,(p),0o)|| sup ||A(p,0 o )|| 

<?>e-B(</>o,r„)pes d pes d 

< if 3 sup sup ||H(p,0) - 0*(p)|| sup ||A(p,0 o )|| 

<p&B(<p ,r n )p& d pe§ d 

under assumption 15.61 (iii). By lemma |G.6| under assumption 15.61 (iv), we have an upper bound: 
sup0 eB (0 Ojrn ) sup ege \\E(p,4>) - 0*(p)|| = 0(r n ) so that we conclude that 

sup sup || [V^*(0,^o) - V^(0*(p),0o)] T A(;p,0o)|| = 0(r„). 

pes d ee=(p,0 To(;p) ) 

For the Case II in the proof of lemma lG.131 the analysis of term B\ does not change while for term 
£>2 we obtain: 

B 2 = sup sup || [V^(8,<p ) - V^(e*(p),<p )] T \{p,(i>o)\\ 

pe(s£J c 0eH(p,</v o(p) ) 

< sup sup sup ||V^(0,0 O ) - V^*(0*(p),0 o )|| sup ||A(p,0 o )|| 
</-eS(</-o,r„)pe(§L) c 6, eS(p, < />) peS d 

<K 3 sup sup sup ||0-0*(p)||O(l) = 0{r n ) 

<f>eB(<f>a,r n ) pe(S« s ) c 0e3(p,<f>) 

under assumptions 15.61 (iii) and (iv) and by lemma IG.6I 
By replacing these rates in the proof of lemma IG. 151 we get 

sup 

pes d 



xM n(p) y v^(0(p),0 Tob) ) - a(p, (hyv**{e*(p),<fo)) [<k - H = o(r n \\^ - <p : 
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This and (|E.4[) give: V0i,02 G B(4> ,r r , 



sup - (p)) - A(p,0 o ) T V,^(0*(p),0o)[0i - 02 



sup 



A(p,0 To(p) ) J V^*(0(p),^ (p) ) - X( P AoY V*¥(0*(p),fo)J [0i-0 2 ] = 0(<), 



and 



sup sup l-v/n^^p) - Sfoip)) - ^/n\(p,(f) ) T V^(9 !l: (p) 1 



1 - 92 



0(V^r 2 n ) = O 



log(n) 



which converges to 0. 
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